# Comparing dev and feature

This notebook compares results between dev and feature titiler deployments. Running end-to-end benchmarks is documented in [https://github.com/developmentseed/tile-benchmarking/tree/main/03-e2e/README.md](https://github.com/developmentseed/tile-benchmarking/tree/main/03-e2e/README.md).

This notebook is comparing titiler-xarray's dev branch at [commit 9ac1686612d](https://github.com/developmentseed/titiler-xarray/commit/9ac1686612d706e0f078a418818b16544efb11c0) with a feature deployment that includes [diskcache](https://github.com/developmentseed/titiler-xarray/tree/feat/diskcache) and another feature deployment (feature2) that includes [fsspec's filecache using EFS](https://github.com/developmentseed/titiler-xarray/tree/feat/fsspec-filecache).

In [10]:
# Import libraries
import os
import pandas as pd
import hvplot.pandas
import holoviews as hv
pd.options.plotting.backend = 'holoviews'
import warnings
warnings.filterwarnings('ignore')
import sys
sys.path.append('..')
from helpers import dataframe
# You will need to set credentials to access nasa-eodc-data-store
from helpers import eodc_hub_role
credentials = eodc_hub_role.fetch_and_set_credentials()

In [11]:
# Remove any previous results
!rm -rf downloaded_dev_results/
!rm -rf downloaded_feature*_results/

In [12]:
%%capture
!aws s3 cp --recursive s3://nasa-eodc-data-store/tile-benchmarking-results/dev_2023-10-25_17-11-17/ downloaded_dev_results/
!aws s3 cp --recursive s3://nasa-eodc-data-store/tile-benchmarking-results/feature_2023-10-25_17-14-44/ downloaded_feature_results/
!aws s3 cp --recursive s3://nasa-eodc-data-store/tile-benchmarking-results/feature2_2023-10-25_17-16-13/ downloaded_feature2_results/
!aws s3 cp --recursive s3://nasa-eodc-data-store/tile-benchmarking-results/feature3_2023-10-25_20-35-19/ downloaded_feature3_results/ 

Parse and merge results into a single dataframe.

In [13]:
results = { 'feature3': {}, 'feature2': {}, 'feature': {}, 'dev': {} }
for env in results.keys():
    # Specify the directory path and the suffix
    directory_path = f"downloaded_{env}_results/"
    suffix = "_urls_stats.csv"  # For example, if you're interested in text files

    # List all files in the directory
    all_files = os.listdir(directory_path)

    # Filter the files to only include those that end with the specified suffix
    files_with_suffix = [f"{directory_path}{f}" for f in all_files if f.endswith(suffix)]

    dfs = []
    for file in files_with_suffix:
        df = pd.read_csv(file)
        df['file'] = file
        dfs.append(df)

    merged_df = pd.concat(dfs)
    merged_df['dataset'] = [file.split('/')[1].replace('_urls_stats.csv', '') for file in merged_df['file']]
    results[env]['all'] = merged_df
    # The "Aggregated" results represent aggregations across tile endpoints. 
    results[env][f'Aggregated {env}'] = merged_df[merged_df['Name'] == 'Aggregated']

In [14]:
dataset_specs_all = dataframe.csv_to_pandas('zarr_info.csv')
#dataset_specs_all

In [15]:
dev_df = results['dev'][f'Aggregated dev']
feature_df = results['feature'][f'Aggregated feature']
feature2_df = results['feature2'][f'Aggregated feature2']
feature3_df = results['feature3'][f'Aggregated feature3']
feature2_df.columns = ['dataset' if col == 'dataset' else col + ' Feature2' for col in feature2_df.columns]
feature3_df.columns = ['dataset' if col == 'dataset' else col + ' Feature3' for col in feature3_df.columns]

merged_df = pd.merge(dev_df, feature_df,  on='dataset', suffixes=(' Dev', ' Feature'))
merged_df = pd.merge(merged_df, feature2_df, on='dataset', how='outer')
merged_df = pd.merge(merged_df, feature3_df, on='dataset', how='outer')

In [16]:
merged_df['Failure Rate Dev'] = merged_df['Failure Count Dev']/merged_df['Request Count Dev'] * 100
merged_df['Failure Rate Feature'] = merged_df['Failure Count Feature']/merged_df['Request Count Feature'] * 100
merged_df['Failure Rate Feature2'] = merged_df['Failure Count Feature2']/merged_df['Request Count Feature2'] * 100
merged_df['Failure Rate Feature3'] = merged_df['Failure Count Feature3']/merged_df['Request Count Feature3'] * 100

summary_df = merged_df[
    [
        'Average Response Time Dev', 'Failure Rate Dev',
        'Average Response Time Feature', 'Failure Rate Feature',
        'Average Response Time Feature2', 'Failure Rate Feature2',
        'Average Response Time Feature3', 'Failure Rate Feature3',        
        'dataset'
    ]
].sort_values('Average Response Time Dev')
merged_specs = summary_df.merge(dataset_specs_all, left_on='dataset', right_on='collection_name')

In [17]:
merged_specs

Unnamed: 0,Average Response Time Dev,Failure Rate Dev,Average Response Time Feature,Failure Rate Feature,Average Response Time Feature2,Failure Rate Feature2,Average Response Time Feature3,Failure Rate Feature3,dataset,collection_name,source,chunks,shape_dict,dtype,chunk_size_mb,compression,number_of_spatial_chunks,number_coordinate_chunks
0,321.826667,0.0,192.325557,0.0,127.1873,0.0,202.135762,0.0,single_chunk_store_lat512_lon1024.zarr,single_chunk_store_lat512_lon1024.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 512}","{'y': 512, 'x': 1024}",float64,4.0,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",1024.0,2
1,342.073466,0.0,208.963919,0.0,133.297186,0.0,233.642259,0.0,single_chunk_store_lat724_lon1448.zarr,single_chunk_store_lat724_lon1448.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 724}","{'y': 724, 'x': 1448}",float64,7.998291,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",1448.0,2
2,542.160083,0.0,381.187136,0.0,189.931547,0.0,451.617459,0.0,with_chunks_store_lat2048_lon4096.zarr,with_chunks_store_lat2048_lon4096.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 1448}","{'y': 2048, 'x': 4096}",float64,31.993164,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",5793.237569,2
3,574.09902,0.0,458.537647,0.0,171.657782,0.0,470.593928,0.0,with_chunks_store_lat1448_lon2896.zarr,with_chunks_store_lat1448_lon2896.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 1448}","{'y': 1448, 'x': 2896}",float64,31.993164,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",2896.0,2
4,586.357011,0.0,475.301659,0.0,164.773804,0.0,516.897948,0.0,single_chunk_store_lat1448_lon2896.zarr,single_chunk_store_lat1448_lon2896.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 1448}","{'y': 1448, 'x': 2896}",float64,31.993164,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",2896.0,2
5,830.197589,0.0,481.579497,0.0,260.615841,0.0,736.179473,0.0,with_chunks_store_lat2896_lon5792.zarr,with_chunks_store_lat2896_lon5792.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 1448}","{'y': 2896, 'x': 5792}",float64,31.993164,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",11584.0,2
6,989.751051,0.0,580.48251,0.0,312.82641,0.0,882.376206,0.0,with_chunks_store_lat5793_lon11586.zarr,with_chunks_store_lat5793_lon11586.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 1448}","{'y': 5793, 'x': 11586}",float64,31.993164,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",46352.001381,2
7,1013.2578,0.0,923.859218,0.0,672.690603,0.0,292.574906,0.0,single_chunk_store_lat1024_lon2048.zarr,single_chunk_store_lat1024_lon2048.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 1024}","{'y': 1024, 'x': 2048}",float64,16.0,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",2048.0,2
8,1054.990834,0.0,846.457988,0.0,246.217022,0.0,902.697445,0.0,single_chunk_store_lat2048_lon4096.zarr,single_chunk_store_lat2048_lon4096.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 2048}","{'y': 2048, 'x': 4096}",float64,64.0,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",4096.0,2
9,1072.402135,0.0,525.009207,0.0,265.299231,0.0,885.512392,0.0,with_chunks_store_lat4096_lon8192.zarr,with_chunks_store_lat4096_lon8192.zarr,s3://nasa-eodc-data-store/test-data/fake-data/...,"{'y': 1, 'x': 1448}","{'y': 4096, 'x': 8192}",float64,31.993164,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, ...",23172.950276,2


NOTE: We don't have chunk information for prod giovanni cache dataset since it is protected (it can be added).

In [19]:
ylim = 3000
xlim = 256
dev_line = merged_specs.sort_values('chunk_size_mb').hvplot.line(
    x='chunk_size_mb', y='Average Response Time Dev', label='Dev', color='cyan',
    xlim=(0, xlim), ylim=(0, ylim)
)

# Plot 'col2'
feature_line = merged_specs.sort_values('chunk_size_mb').hvplot.line(
    x='chunk_size_mb', y='Average Response Time Feature', label='Feature', color='magenta', alpha=0.4,
    xlim=(0, xlim), ylim=(0, ylim)
)

feature2_line = merged_specs.sort_values('chunk_size_mb').hvplot.line(
    x='chunk_size_mb', y='Average Response Time Feature2', label='Feature2', color='orange', alpha=0.4,
    xlim=(0, xlim), ylim=(0, ylim)
)

feature3_line = merged_specs.sort_values('chunk_size_mb').hvplot.line(
    x='chunk_size_mb', y='Average Response Time Feature3', label='Feature3', color='green', alpha=0.4,
    xlim=(0, xlim), ylim=(0, ylim)
)

# Combine the two line plots
combined_plot = dev_line * feature_line * feature2_line * feature3_line
combined_plot