# Tile Generation Benchmarks for a Zarr Pyramid

## Explanation

In this notebook we return to the CMIP6 data to compare the performance of tiling the original data with a pyramid. This helps us understand the performance improvements at lower zoom levels when a pyramid is available.

## Setup

In [None]:
import json
import pandas as pd
fomr xarray_tile_test import XarrayTileTest
import sys; sys.path.append('..')
import helpers.eodc_hub_role as eodc_hub_role

In [None]:
credentials = eodc_hub_role.fetch_and_set_credentials()

We load the pyramid and the zarr dataset with the same chunk shape as the original dataset. We expect this dataset and the kerchunk performance to be about the same.

In [None]:
iterations = 3
zooms = range(12)
cmip6_zarr_datasets = json.loads(open('../01-generate-datasets/cmip6-zarr-datasets.json').read()).items()
zarr_dataset_id, zarr_dataset = list(filter(k.contains('600_1440_1') for k,v in zarr_datasets.items()))[0]
pyramid_dataset_id, pyramid_dataset = json.loads(open('../01-generate-datasets/cmip6-pyramid-dataset.json').read())


## Run Tests

In [None]:
results = []

zarr_tile_test = XarrayTileTest(
    dataset_id=zarr_dataset_id,
    **zarr_dataset
)

# Run it 3 times for each zoom level
for zoom in zooms:
    zarr_tile_test.run_batch({'zoom': zoom}, batch_size=iterations)

results.append(zarr_tile_test.store_results(credentials))

pyramid_tile_test = XarrayTileTest(
    dataset_id=pyramid_dataset_id,
    **pyramid_dataset
)

# Run it 3 times for each zoom level
for zoom in zooms:
    pyramid_tile_test.run_batch({'zoom': zoom}, batch_size=iterations)

results.append(pyramid_tile_test.store_results(credentials))


In [None]:
## Read and Plot Results

In [None]:
see code in run-xarray-tests.ipynb

In [None]:
expanded_df.plot.scatter(x='zoom', y='time', by='dataset_id')

In [None]:
expanded_df.results.to_csv('results/05-cmip6-pyramid-results.csv')