# Comparing Tiling across Data Formats

## Description

In this notebook we compare the performance of tiling CMIP6 data stored as COG, NetCDF and Zarr. In order to tile the NetCDF, we use a kerchunk reference file. You are able to use the ZarrReader with NetCDF directly, however you cannot read more than file at once which makes it incomparable with the pgSTAC+COG and Zarr methods.

## Setup

In [None]:
import json
import pandas as pd
from cog_tile_test import CogTileTest
fomr xarray_tile_test import XarrayTileTest
import sys; sys.path.append('..')
import helpers.eodc_hub_role as eodc_hub_role

In [None]:
credentials = eodc_hub_role.fetch_and_set_credentials()

Below we only load the CMIP6 Zarr dataset which has the same chunk structure as the original NetCDF data.

In [None]:
# Run 3 iterations of each setting
iterations = 3
zooms = range(12)
cog_dataset_id, cog_dataset = json.loads(open('../01-generate-datasets/cog-datasets.json').read()).items()
kerchunk_dataset_id, kerchunk_dataset = json.loads(open('../01-generate-datasets/cmip6-kerchunk-dataset.json').read()).items()
zarr_datasets = json.loads(open('../01-generate-datasets/cmip6-zarr-datasets.json').read()).items()
# Filter for 
zarr_dataset_dict = list(filter(k.contains('600_1440_1') for k,v in zarr_datasets.items()))[0]
zarr_dataset_id, zarr_dataset = zarr_dataset_dict.items()

## Run Tests

### COG Tests

In [None]:
# Based on our findings in 01-cog-gdal-tests we run these tests with set_gdal_vars to True.
cog_tile_test = CogTileTest(
    dataset_id=dataset_id,
    extra_args={
        'query': dataset['example_query'],
        'set_gdal_vars': True
    }
)

# Run it 3 times for each zoom level
for zoom in zooms:
    cog_tile_test.run_batch({'zoom': zoom}, batch_size=iterations)

cog_results = cog_tile_test_set.store_results(credentials)

### Kerchunk Tests

In [None]:
kerchunk_tile_test = XarrayTileTest(
    dataset_id=kerchunk_dataset_id,
    *kerchunk_dataset
)

# Run it 3 times for each zoom level
for zoom in zooms:
    kerchunk_tile_test.run_batch({'zoom': zoom}, batch_size=iterations)

kerchunk_results = kerchunk_tile_test_set.store_results(credentials)

In [None]:
zarr_tile_test = XarrayTileTest(
    dataset_id=zarr_dataset_id,
    *zarr_dataset
)

# Run it 3 times for each zoom level
for zoom in zooms:
    zarr_tile_test.run_batch({'zoom': zoom}, batch_size=iterations)

zarr_results = zarr_tile_test_set.store_results(credentials)

In [None]:
## Read and Plot Results

In [None]:
see code in run-xarray-tests.ipynb

In [None]:
expanded_df.plot.scatter(x='zoom', y='time', by='dataset_id')

In [None]:
expanded_df.results.to_csv('results/cog-kerchunk-zarr-results.csv')