## Bespoke S3 TIFF reader

Rather than using rasterio/GDAL we have custom reader that uses `botocore` to fetch data from S3 bucket, parses tif header, then reads requested chunk. Overall this equates to 2 requests per image when doing pixel drill type access.

Reader limitations:

- Tiled images only (COG only)
- DEFLATE compression only
- No predictor (expect to add, as it's common with 16bit images)
- Only reads band 1 (easy to add support for multi-band images)
- Doesn't interpret GEO referencing data

## Experiment Setup

- Wofs LS5, full history query
- Tile -9 -18, chunk (8,2)
- Image Properties
    - 4000x4000 single band uint8
    - Chunk size 256x256
    - No pixel differencing applied before compressing
    - ZIP (level 9) with GeoTiff only metadata
- 1416 time slices
- Access one chunk from each time slice
- M5.xlarge instance 4 cores 16G ram
- Chunk with largest compressed size was chosen
- S3 bucket and EC2 both in Sydney region
- "Random" 5-char prefix added to file names
- Data location
    - http://dea-public-data.s3-website-ap-southeast-2.amazonaws.com/?prefix=bench-data/

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
import pickle
from utils import bench

## Load data

In [None]:
files = ['./results/M5XL_ZIP2R_-9_-18b8_2B1__01_001.pickle',
        './results/M5XL_ZIP2_-9_-18b8_2B1__01_001.pickle',
        './results/M5XL_ZIP2R_-9_-18b8_2B1__16_001.pickle', ## fastest total completion
        './results/M5XL_ZIP2_-9_-18b8_2B1__20_001.pickle',  ## fastest total completion
        ]
dd = [pickle.load(open(file, 'rb')) for file in files]
sts = [bench.unpack_stats(d, ms=True) for d in dd]

## Comparison GDAL vs Bespoke (single thread)

In [None]:
reports = (bench.gen_stats_report(dd[0], 'GDAL'),
           bench.gen_stats_report(dd[1], 'BESPOKE'))
print(bench.join_reports(*reports))
fig = plt.figure(figsize=(12,6))
bench.plot_comparison(fig, sts[:2], 
                      names=['GDAL', 'BESPOKE'], 
                      threshs=[400, 250, 80],
                      nochunk=True,)

## Comparison GDAL vs Bespoke (fastest parallel run)

In [None]:
reports = (bench.gen_stats_report(dd[2], 'GDAL'),
           bench.gen_stats_report(dd[3], 'BESPOKE'))
print(bench.join_reports(*reports))
fig = plt.figure(figsize=(12,6))
bench.plot_comparison(fig, sts[2:4], 
                      names=['GDAL', 'BESPOKE'], 
                      threshs=[400, 250, 80],
                      nochunk=True,)

## Analysis

Using bespoke implementation we are able to reduce number of S3 get requests compared to GDAL implementation and as a result have a significantly faster load time overall (more than **2 times** faster, in parallel case, **61%** reduction of total latency).

Single thread peformance is **36.7%** faster for bespoke implementation, but more importantly parallell scaling is better for bespoke approach (due to fewer S3 GET requests), as a result bespoke appoach completes in **5.82s** in the fastest case using 20 threads, while GDAL implementation achieves **14.96s** fastest time using 16 threads.

Bespoke appoach is siginificantly faster when reading image header, but quite a bit slower when reading pixel data. This could be due to fairly un-optimized data retrieval code that does more data copies than neccessary: `http -> zlib -> temp -> numpy`, it's fairly simple to implement custom decompress pipeline that extracts data directly into it's final destination avoding an extra copy.

## Future work

Using bespoke TIF reader allows further experimentation

- Caching image metadata in some DB and hence avoid loading header data at all, potentially halving access time
- Use AsyncIO instead of multi-threading, or in combination with multi-threading to achieve higher IO performance
