Geospatial data can be extremely large while machine learning models are only able to predict small inputs due to memory constraints. Even using a high-end graphics card with > 24 GB of VRAM, predicting larger than 2048 x 2048 px tiles is difficult. As a result, predicting over tiles is the preferred way to process large images.

We normally assume that CNNs are translation invariant. Roughly speaking this means that if a feature is fully visible, it should be predicted with similar confidence whether it's at the edge or the centre of the image. In practice this is only true in the centre of a tile, since at the edges the network sees blank data and has less context to make the prediction. To avoid these edge effects, images are tiled in an overlapping fashion with the overap roughly equal to one "receptive field" of the network. Empirically, 512 px gives a good result in most cases. This logic doesn't necessarily hold for newer architectures based on Transformers which don't have the same inductive biases as convolutional networks.

Our pipeline provides some simple utility functions to generate arbitrary resolution tiles from a source orthomosaic. Since the model is trained at 10 cm/px, we resample imagery to that resolution (nominally) though we also perform some augmentation at prediction-time.

Drone orthomosaics are often much higher quality, at 1-5cm/px so we have to resample. We provide two options:

 - Resize the image on the fly
 - Resample the image and store to disk
 
The first option is fast and works if you want to process an image once. If you think you're likely to experiment with different prediction settings, resampling as a one-time process might make more sense. Rescaling on the fly is also approximate and assumes that linear scaling is appropriate (this is normally true, if you look at the transformation from pixels -> world coordinates). You might find that you get different results if you compare both methods, but it shouldn't affect things much.
 
In order to plug in to machine learning pipelines, we provide a `TiledGeoImage` that takes a single image as an input and returns tile "samples". You can then iterate over this dataset sequentially to load all the tiles (e.g. you can directly pass it to a dataloader).

In [None]:
from tcd_pipeline.data.tiling import Tiler, TiledImage, TiledGeoImage

First, let's look at our helper class - `Tiler` which generates a list of tiles for a given input. This class just takes an image size and some specifications for tile size and overlap.

In [None]:
tiler = Tiler(width=2048, height=2048, tile_size=1024, min_overlap=256)
[a for a in tiler.tiles]

The default settings perform a naive tiling with a stride of (tile_size - overlap). For a tile size of 1024 px with _some_ overlap, we need at least 3 tiles. The tiler also has an option to evenly distribute tiles subject to a _minimum overlap_:

In [None]:
tiler = Tiler(width=2048, height=2048, tile_size=1024, min_overlap=256, exact_overlap=False)
[a for a in tiler.tiles]

The difference here is that we compute the minimum number of tiles required (again, 3) and distribute the tiles across the range. This has less of an impact for very large images, but you may find you prefer results from one approach over the other. The main advantage of this tiling strategy is that you don't need to worry about an odd-shaped tile at the edge of the image. It also provides more overlap for each tile, which may produce better predictions in those regions, but also requires some additional logic when merging results as the "requested" overlap isn't necessarily what the tiler will give you.

Let's tile an image with the default settings - here we're ignoring geospatial information:

In [None]:
tiled_image = TiledImage("../data/5c15321f63d9810007f8b06f_10_00000.tif",
                        tile_size=1024,
                        overlap=256)
tiled_image.visualise(edges=True, boxes=True, midpoints=True)

With no overlap, we distribute 4 tiles over the image:

In [None]:
tiled_image = TiledImage("../data/5c15321f63d9810007f8b06f_10_00000.tif",
                        tile_size=1024,
                        overlap=0)
tiled_image.visualise(edges=True, boxes=True, midpoints=True)

If we use the `GeoImage` class, we can also specify a target ground sample distance. For example here we're asking for a GSD of 0.2 m/px and as the image is 1024x1024, we expect a single tile:

In [None]:
tiled_image = TiledGeoImage("../data/5c15321f63d9810007f8b06f_10_00000.tif",
                        tile_size=1024, overlap=10, target_gsd=0.2, pad_if_needed=False)

In [None]:
tiled_image.visualise(edges=True, boxes=True, midpoints=True)

Let's try on a larger image:

In [None]:
tiled_image = TiledGeoImage("../data/5f058f16ce2c9900068d83ed_10.tif",
                        tile_size=1024, overlap=256, target_gsd=0.2, pad_if_needed=False)

We expect the slice size to be 2048 as the image gsd is 0.1m, but the final tiles emitted by the dataset should be 1024x1024

In [None]:
for t in tiled_image.tiler.tiles:
    x, y = t
    assert x.stop-x.start == 2048, (x.stop-x.start)
    assert y.stop-y.start == 2048, (y.stop-y.start)

In [None]:
tiled_image.visualise(edges=True, boxes=True, midpoints=True)

In [None]:
tiled_image.visualise_tile(15)

What about a huge image, like all of Zurich?

Swisstopo provides tile data for the entire country, which is great, but the tiles don't overlap. For example here are some of the tiles:

In [None]:
!ls /media/josh/datasets/swisstopo/zurich_city_2022/ | head 

So we use GDAL to build a virtual TIFF:
```
gdalbuildvrt zurich_2022.vrt ./zurich_city_2022/*.tif
```

This is basically an index into the files that allows readers like `rasterio` (and by extension `GDAL`) to convert a query into a file and offset lookup. The virtual file is mere kilobytes, even though the raw GeoTIFF would be gigabytes in size. We can then load this VRT into the tiler:

In [None]:
tiled_image = TiledGeoImage("/media/josh/datasets/swisstopo/zurich_2022.vrt",
                        tile_size=1024, overlap=256, target_gsd=0.2, pad_if_needed=False)

In [None]:
len(tiled_image)

And if we load two adjacent tiles, we can see that overlap is handled correctly:

In [None]:
tiled_image.visualise_tile(5000)

In [None]:
tiled_image.visualise_tile(5001) # Shifted to the right

By using this strategy, you can scale dataloading to enormous file sizes without needing to worry about memory constraints. You only need enough memory to load the index (negligible) and the tile itself.