# Tile Generation Benchmarks for Various Numbers of Spatial Chunks

## Explanation

In this notebook we compare the performance of tiling artificially generated Zarr data when the chunk size is constant, but increasing spatial resolution means a greater number of chunks.

## Setup

In [1]:
%%capture
!pip install -r ../requirements.txt

In [7]:
import json
import pandas as pd
from xarray_tile_test import XarrayTileTest
import sys; sys.path.append('..')
import helpers.eodc_hub_role as eodc_hub_role
import helpers.dataframe as dataframe_helpers
import warnings
warnings.filterwarnings('ignore')

In [3]:
credentials = eodc_hub_role.fetch_and_set_credentials()

Load the fake datasets which have increasing numbers of chunks (but all at the same chunk size, 32MB).

In [4]:
# Run 3 iterations of each setting
iterations = 3
zooms = range(12)
all_zarr_datasets = json.loads(open('../01-generate-datasets/fake-datasets.json').read())
zarr_datasets = {k: v for k, v in all_zarr_datasets.items() if 'with_chunks' in k}

## Run Tests

In [5]:
results = []

for zarr_dataset_id, zarr_dataset in zarr_datasets.items():
    zarr_tile_test = XarrayTileTest(
        dataset_id=zarr_dataset_id,
        **zarr_dataset
    )

    # Run it 3 times for each zoom level
    for zoom in zooms:
        zarr_tile_test.run_batch({'zoom': zoom}, batch_size=iterations)

    results.append(zarr_tile_test.store_results(credentials))

/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (65, 87) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].
/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (15, -86) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].


Wrote instance data to s3://nasa-eodc-data-store/test-results/20230902205253_XarrayTileTest_with_chunks_store_lat1448_lon2896.zarr.json


/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (160, -88) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].
/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (-121, -86) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].


Wrote instance data to s3://nasa-eodc-data-store/test-results/20230902205307_XarrayTileTest_with_chunks_store_lat2048_lon4096.zarr.json


/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (-43, 88) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].
/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (-104, 89) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].
/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (-25, -87) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].
/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (-106, 88) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].
/srv/conda/envs/notebook/lib/python3.10/site-packages/morecantile/models.py:474: PointOutsideTMSBounds: Point (-38, -86) is outside TMS bounds [-180.0, -85.0511287798066, 180.0, 85.0511287798066].
/srv/conda/envs/

Wrote instance data to s3://nasa-eodc-data-store/test-results/20230902205324_XarrayTileTest_with_chunks_store_lat2896_lon5792.zarr.json
Wrote instance data to s3://nasa-eodc-data-store/test-results/20230902205342_XarrayTileTest_with_chunks_store_lat4096_lon8192.zarr.json
Wrote instance data to s3://nasa-eodc-data-store/test-results/20230902205406_XarrayTileTest_with_chunks_store_lat5793_lon11586.zarr.json


In [None]:
## Read and Plot Results

In [8]:
all_df = dataframe_helpers.load_all_into_dataframe(credentials, results)
expanded_df = dataframe_helpers.expand_timings(all_df)

In [10]:
import hvplot.pandas
expanded_df.hvplot.scatter(x='number_of_spatial_chunks', y='time', by='zoom')

In [13]:
expanded_df.to_csv('results-csvs/04-number-of-spatial-chunks-results.csv')