# Recommendations

## Data Format

* **Data Format:** At this time, COG + pgSTAC tiling performs better than tiling Zarr or kerchunk references, at all zoom levels.
* **Kerchunk Reference Files:** The performance of tiling using a kerchunk reference can be as good or better than a zarr store. It is important to consider this is when the NetCDF files' chunks are the same as the zarr store version. 

## Zarr-specific Recommendations

* **Ensure no zarr coordinate chunking:** Ensure coordinate data is not being chunked. If coordinates are being chunked, it will result in more files being opened during xarray.open_dataset and cause significant performance degradation.
* **Smaller chunk sizes perform better:** Chunk size significantly impacts performance. A specific recommendation depends on the performance requirements of the application.
* **Fewer spatial chunks perform better:** A greater number of chunks, spatially, will impact performance especially at low zoom levels as more chunks are loaded for greater spatial coverage.
* **Pyramids improve performance for high resolution datasets:** High resolution datasets will suffer having either large chunks or many chunks, or both. To provide a good experience, zarr data can be aggregated into multiscale datasets, otherwise known as pyramids.

## What is high resolution?

Given the current performance of titiler-xarray in [tile-server-e2e-benchmarks.ipynb](./tile-server-e2e-benchmarks.ipynb) and assuming you are targeting 300ms or less, it would be suggested to target 8mb or smaller for your chunks.

To give a sense of what this means in terms of spatial resolution, and assuming the full spatial extent is stored in a single chunk, you would have the following dimensions of your dataset:

In [4]:
import numpy as np
datatypes = ["float16", "float32", "float64"]
total_global_chunk_size_mb = 8

for data_type in datatypes:
    # Determine the size in bytes of each data value
    dtype = np.dtype(data_type)
    # calcuate the itemsize in megabytes
    itemsize_mb = dtype.itemsize/1024/1024
    lat_dim = np.sqrt(total_global_chunk_size_mb/2/itemsize_mb)
    lon_dim = lat_dim * 2
    print(f"For data type {dtype}, an 8MB spatial dataset would have:")
    print(f"* Dimensions: {np.round(lat_dim)} latitude x {np.round(lon_dim)} longitude")
    print(f"* Degrees: {np.round(180/lat_dim, 3)} x {np.round(360/lon_dim, 3)}")
    print(f"* Meters at the equator: {np.round(111000/lat_dim, 3)} x {np.round(111320/lon_dim, 3)}\n")

For data type float16, an 8MB spatial dataset would have:
* Dimensions: 1448.0 latitude x 2896.0 longitude
* Degrees: 0.124 x 0.124
* Meters at the equator: 76.649 x 38.435

For data type float32, an 8MB spatial dataset would have:
* Dimensions: 1024.0 latitude x 2048.0 longitude
* Degrees: 0.176 x 0.176
* Meters at the equator: 108.398 x 54.355

For data type float64, an 8MB spatial dataset would have:
* Dimensions: 724.0 latitude x 1448.0 longitude
* Degrees: 0.249 x 0.249
* Meters at the equator: 153.299 x 76.87



If your dataset has a higher resolution than what is listed above, you will either want to chunk your data spatially or create a pyramid or both. Having spatially chunked data can also impact performance at low zoom, so you should try chunks significantly  smaller than 8MB, say 4MB. Assuming the spatial extent of your data is larger than 16MB, you will probably want to create a pyramid.