## Experiment Setup

- Wofs LS5, full history query
- Tile -9 -18, chunk (8,2)
- Image Properties
    - 4000x4000 single band uint8
    - Chunk size 256x256
    - DEFLATE lvl 9, no differencing
- 1416 time slices
- Access one chunk from each time slice
- M5.xlarge instance 4 cores 16G ram
- Chunk with largest compressed size was chosen
- S3 bucket and EC2 both in Sydney region
- "Random" 5-char prefix added to file names
- http://dea-public-data.s3-website-ap-southeast-2.amazonaws.com/?prefix=bench-data/

## Bespoke S3 TIFF reader

Rather than using rasterio/GDAL we have custom reader that uses `botocore` to fetch data from S3 bucket, parses tif header, then reads requested chunk. Overall this equates to 2 requests per image when doing pixel drill type access.

Reader limitations:

- Tiled images only (COG only)
- DEFLATE compression only
- No predictor (expect to add, as it's common with 16bit images)
- Only reads band 1 (easy to add support for multi-band images)
- Doesn't interpret GEO referencing data

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
import pickle
from types import SimpleNamespace
from utils import bench

## Load data

In [None]:
import glob
files = sorted(glob.glob('./results/M5XL_ZIP2_-9_-18*_001.pickle'))
dd = [pickle.load(open(file, 'rb')) for file in files]

## Scaling with more threads

In [None]:
fig = plt.figure(figsize=(12,6))
best_idx = bench.plot_stats_results(dd, fig=fig)

## In depth stats for single threaded case

In [None]:
print(bench.gen_stats_report(dd[0]))
fig = plt.figure(figsize=(12,6))
bench.plot_results(dd[0].stats, fig=fig);

## Analysis

- Significantly lower open costs than gdal based solution
- Open costs are slightly higher than read
  - Open reads 4K chunk at the start of the file, which is less than the smallest data chunk, but is still slower
  - Most likely first access costs more than second access
- Scales well with more processing workers
  - Limited by latency not throughput
  - Likely limited by number of requests per second (just under 500 requests per second)
  - For data with larger data chunks throughput will matter more I guess

