# Profiling tiling code for pgSTAC + COG

A pgSTAC database stores metadata about CMIP6 COGs on S3. The libraries used were pgstac for reading STAC metadata and rio_tiler's rasterio for reading COGs on S3.

In this notebook we load results from https://github.com/developmentseed/tile-benchmarking/blob/main/profiling/profile.ipynb to demonstrate:

1. The importance of GDAL variables in performance.
2. Variation across tiles is not significant.
3. Tiling with pgSTAC + COGs is fast when compared with titiler-xarray tiling of Zarr stores.

In [3]:
import pandas as pd
import hvplot
pd.options.plotting.backend = 'holoviews'

git_url_path = "https://raw.githubusercontent.com/developmentseed/tile-benchmarking/feat/fake-data/profiling/results"
pd.read_csv(f"{git_url_path}/pgstac_cog_gdal_results.csv")

Unnamed: 0,gdal_vars_set?,tile times,mean total time
0,with_gdal_vars,"[63.41, 54.53, 54.46]",57.466667
1,without_gdal_vars,"[14687.78, 30817.34, 17722.72]",21075.946667


We don't need many iterations since the variation is so great. You can see that setting GDAL environment variables makes things at least 100x faster.

These GDAL variables are documented here https://developmentseed.org/titiler/advanced/performance_tuning/, but that advice is copied into comments below for ease of reference.

By setting the GDAL environment variables we limit the number of total requests to S3.

Specifically, these environment variables ensure that:

* All of the metadata may be read in 1 request. This is not necessarily true, but more likely since we increase the initial number of GDAL ingested bytes.
* There is no superfluous LIST request to account for sidecar files, which don't exist for COGs.
* Consecutive range requests are merged into 1 request.
* Multiple range requests use the same TCP connection.

# Time to create different tiles

The difference between different tiles is very small.

In [56]:
df = pd.read_csv(f"{git_url_path}/pgstac_cog_tile_results.csv")
df.plot.scatter(x='xyz tile', y='mean total time', label = 'Mean Time to Tile (ms) by Zoom Level')