# Tile Generation Benchmarks for a Zarr Pyramid

## Explanation

In this notebook we return to the CMIP6 data to compare the performance of tiling the original data with a pyramid. This helps us understand the performance improvements at lower zoom levels when a pyramid is available.

## Setup

In [30]:
%load_ext autoreload
%autoreload

# External modules
import hvplot.pandas
import holoviews as hv
import json
import pandas as pd
pd.options.plotting.backend = 'holoviews'
import warnings
warnings.filterwarnings('ignore')

# Local modules
import sys; sys.path.append('..')
import helpers.eodc_hub_role as eodc_hub_role
import helpers.dataframe as dataframe_helpers
from xarray_tile_test import XarrayTileTest

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
credentials = eodc_hub_role.fetch_and_set_credentials()

We load the pyramid and the zarr dataset with the same chunk shape as the original dataset. We expect this dataset and the kerchunk performance to be about the same.

In [6]:
iterations = 1
zooms = range(4)
cmip6_zarr_datasets = json.loads(open('../01-generate-datasets/cmip6-zarr-datasets.json').read())
zarr_dataset_id, zarr_dataset = list({k: v for k, v in cmip6_zarr_datasets.items() if '600_1440_1' in k}.items())[0]
pyramid_dataset_id, pyramid_dataset = list(json.loads(open('../01-generate-datasets/cmip6-pyramid-dataset.json').read()).items())[0]

## Run Tests

In [None]:
results = []

zarr_tile_test = XarrayTileTest(
    dataset_id=zarr_dataset_id,
    **zarr_dataset
)

# Run it 3 times for each zoom level
for zoom in zooms:
    zarr_tile_test.run_batch({'zoom': zoom}, batch_size=iterations)

results.append(zarr_tile_test.store_results(credentials))

In [31]:
pyramid_tile_test = XarrayTileTest(
    dataset_id=pyramid_dataset_id,
    lat_extent=[-59, 89],
    lon_extent=[-179, 179],    
    **pyramid_dataset
)

# Run it 3 times for each zoom level
for zoom in zooms:
    pyramid_tile_test.run_batch({'zoom': zoom}, batch_size=iterations)

results.append(pyramid_tile_test.store_results(credentials))

TileOutsideBounds: Tile 2/0/2 is outside bounds

In [None]:
## Read and Plot Results

In [None]:
expanded_df.plot.scatter(x='zoom', y='time', by='dataset_id')

In [None]:
expanded_df.results.to_csv('results/05-cmip6-pyramid-results.csv')