# Tile Generation Benchmarks for COGs with GDAL Environment Variables

## Explanation

In this notebook we demonstrate the importance of using GDAL ENV variables when working with rasterio to read data from Cloud-Optimized GeoTIFFs. [titiler-pgstac](https://github.com/stac-utils/titiler-pgstac/) creates image tiles using [rio-tiler](https://github.com/cogeotiff/rio-tiler) which uses [rasterio](https://github.com/rasterio/rasterio).

These GDAL variables are documented here https://developmentseed.org/titiler/advanced/performance_tuning/. Those comments are also in cog_tile_test.py.

We run 3 iterations of generating tiles from zoom 0 to 12 for tiling when GDAL environment variables are set and when they are unset and display the results.

## Setup

In [15]:
%load_ext autoreload
%autoreload
import json
import pandas as pd
from cog_tile_test import CogTileTest
import hvplot.pandas
import holoviews as hv
pd.options.plotting.backend = 'holoviews'

import sys; sys.path.append('..')
import helpers.eodc_hub_role as eodc_hub_role
import helpers.dataframe as dataframe_helpers
import warnings
warnings.filterwarnings('ignore')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
credentials = eodc_hub_role.fetch_and_set_credentials()

In [3]:
# Run 3 iterations of each setting
iterations = 5
zooms = range(6)
dataset_id, dataset = list(json.loads(open('../01-generate-datasets/cmip6-pgstac/cog-datasets.json').read()).items())[0]

In [4]:
dataset

{'example_query': {'collections': ['CMIP6_daily_GISS-E2-1-G_tas'],
  'filter': {'op': 't_intersects',
   'args': [{'property': 'datetime'}, {'interval': ['1950-04-01T00:00:00Z']}]},
  'filter-lang': 'cql2-json'}}

## Run tests

In [9]:
shared_args = {
    'dataset_id': dataset_id,
    'lat_extent': [-59, 89],
    'lon_extent': [-179, 179],
    'extra_args': {
        'query': dataset['example_query'],
        'credentials': credentials
    }
}

In [6]:
# Create a test with gdal vars unset
shared_args['extra_args']['set_gdal_vars'] = False
cog_tile_test_unset = CogTileTest(**shared_args)

Caught exception: An error occurred (InvalidPermission.Duplicate) when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 34.214.6.100/32, TCP, from port: 5432, to port: 5432, ALLOW" already exists
Connected to database


In [7]:
for zoom in zooms:
    cog_tile_test_unset.run_batch({'zoom': zoom}, batch_size=iterations)

unset_results = cog_tile_test_unset.store_results(credentials)

Wrote instance data to s3://nasa-eodc-data-store/test-results/20230905154823_CogTileTest_CMIP6_daily_GISS-E2-1-G_tas.json


In [10]:
# Create a test with gdal vars SET
shared_args['extra_args']['set_gdal_vars'] = True
cog_tile_test_set = CogTileTest(**shared_args)

Caught exception: An error occurred (InvalidPermission.Duplicate) when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 34.214.6.100/32, TCP, from port: 5432, to port: 5432, ALLOW" already exists
Connected to database


In [11]:
for zoom in zooms:
    cog_tile_test_set.run_batch({'zoom': zoom}, batch_size=iterations)

set_results = cog_tile_test_set.store_results(credentials)

Wrote instance data to s3://nasa-eodc-data-store/test-results/20230905155004_CogTileTest_CMIP6_daily_GISS-E2-1-G_tas.json


# Read + Plot results 

In [12]:
# see code in run-xarray-tests.ipynb
results_urls = [unset_results, set_results]
results_df = dataframe_helpers.load_all_into_dataframe(credentials, results_urls)

In [21]:
expanded_df = dataframe_helpers.expand_timings(results_df)
expanded_df['set_gdal_vars'] = expanded_df['set_gdal_vars'].astype(str)


In [22]:
cmap = ["#E1BE6A", "#40B0A6"]
plt_opts = {"width": 300, "height": 250}

plts = []

for zoom_level in zooms:
    df_level = expanded_df[expanded_df["zoom"] == zoom_level]
    plts.append(
        expanded_df.hvplot.box(
            y="time",
            by=["set_gdal_vars"],
            c="set_gdal_vars",
            cmap=cmap,
            ylabel="Time to render (ms)",
            xlabel="GDAL Environment Variables Set/Unset",
            legend=False,
        ).opts(**plt_opts)
    )
hv.Layout(plts).cols(2)

In [23]:
expanded_df.to_csv('results-csvs/01-cog-gdal-results.csv')