A limiting factor (for overview creation) is how fast a TIFF image file can be written out.

Here we attempt to benchmark that.

Note %timeit does multiple iterations, %time does one.

In [4]:
import numpy as np, rasterio, tempfile

In [77]:
size = 4096 * 4
chunk = 512
name = 'test.tif'

%timeit data = np.random.random((chunk, chunk)).astype(np.float32)
data = np.ones((chunk, chunk), dtype=np.float32)

2.65 ms ± 25.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [81]:
! rm test.tif ; ls

pyramid.py  Test.ipynb


In [108]:
import numpy as np, rasterio
def f():
    with rasterio.open(name, 'w', driver='GTiff', width=size, height=size, count=1, dtype=np.float32,
                       tiled=True, blockxsize=256, blockysize=256, nodata=0,
                       compress='lzw', num_threads='all_cpus') as dst: 
        for i in range(0, size, chunk):
            for j in range(0, size, chunk):
                data = np.random.random((chunk, chunk)).astype(np.float32)
                dst.write(data, window=rasterio.windows.Window(i, j, chunk, chunk), indexes=1)
%time f()

CPU times: user 21.3 s, sys: 2.31 s, total: 23.6 s
Wall time: 10.9 s


In [93]:
! ls -alhF test.tif

-rw-r--r-- 1 brl654 u46 1.2G Feb 27 15:18 test.tif


Results:

25 seconds to write a gigabyte of compressed random data, in 500^2 chunks (unaligned with file chunks), total extent about 4x4 Albers tiles. No change if slightly increasing all dimensions to align the same number of chunks.

Same takes about 7 seconds if compression disabled. (Switching random data for trivially compressible data performed similarly. Note that completely random data does not actually compress at all.) This suggests the compression computation is far more significant than the write itself. 

This was for 4x4 100km tiles at 25m resolution. This is about 1% of the continent. This suggests the full continent could be written in a dozen minutes (with no compression), or the full overview could be written in 3-4 minutes. (Noting that the infinite sum (1/4)^n has limit 1/3.) It would only be of order 100GB, and would potentially give faster read performance than if it were compressed. (At least, locally on NCI. Over web, may depend on effectiveness of compression built into the transfer protocols..)

Can get compression done in 11 seconds if passing the multithreading option to GDAL. 

Was previously observing times of 5+ hours for a continental RGB overview. Presumably this should be feasible in more like 20min. 

