# GHS-composite-S2 R2020A

This section covers mirroring locally the UK section of the cloud-free composite Sentinel-2 mosaic created by the European Commission. Official website is over at:

> https://ghsl.jrc.ec.europa.eu/ghs_s2composite.php

And paper for the dataset is:

> Corbane, C., Politis, P., Kempeneers, P., Simonetti, D., Soille, P., Burger, A., ... & Kemper, T. (2020). [A global cloud free pixel-based image composite from Sentinel-2 data](https://www.sciencedirect.com/science/article/pii/S2352340920306314). *Data in Brief*, 105737.

In [1]:
import sys
sys.path.insert(0, "../")
import utils
import rasterio
import rioxarray
import datashader as ds

## Local paths

Before we get into any processing, let's set up the local directories for the downloads:

In [2]:
local_dir = "../../data"

## Tile listing

According to Corbane et al. (2020), the dataset is split on UTM tiles, each of them further subdivided on small GeoTIFF files. The tiles required to cover Great Britain are:

In [3]:
utm_tiles = ["30U", "31U", "29V", "30V"]

Similarly, the base URL for the dataset's FTP is:

In [4]:
base_url = ("http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/"\
            "GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/"\
            "GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/"
           )
base_url

'http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/'

## Virtual rasters

Each UTM tile has a `.vrt` file, a XML file that lists each component. For example, for the 30U tile, it is available at:

In [5]:
p = f"{base_url}{utm_tiles[0]}/{utm_tiles[0]}_UTM.vrt"
p

'http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/30U/30U_UTM.vrt'

Since `rasterio` supports `.vrt` files, we can inspect it:

In [6]:
%time tile30U = rasterio.open(p)

CPU times: user 3.98 ms, sys: 3.98 ms, total: 7.96 ms
Wall time: 538 ms


In [7]:
tile30U.files

['/vsicurl/http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/30U/30U_UTM.vrt',
 '/vsicurl/http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/30U/30U_UTM.vrt.ovr',
 '/vsicurl/http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/30U/30U_UTM.vrt.ovr.ovr',
 '/vsicurl/http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/30U/30U_UTM.vrt.ovr.ovr.ovr',
 '/vsicurl/http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/30U/30U_UTM.vrt.ovr.ovr.ovr.ovr',
 '/vsicurl/http://jeodpp.jrc.ec.europa.eu/ftp/jrc-

In [10]:
tile30U.files[-1].strip("/vsicurl/")

'http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/GHSL/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A/GHS_composite_S2_L1C_2017-2018_GLOBE_R2020A_UTM_10/V1-0/30U/S2_percentile_UTM_209-0000000000-0000000000.tif'

In [None]:
utils.download(tile30U.files[-1].strip("/vsicurl/"))

Downloading S2_percentile_UTM_209-0000000000-0000000000.tif into S2_percentile_UTM_209-0000000000-0000000000.tif:   7%|▋         | 162M/2.18G [02:42<14:21, 2.53MB/s]     

In [18]:
%time tst = rioxarray.open_rasterio(tile30U.files[-1])

CPU times: user 47 ms, sys: 5.52 ms, total: 52.5 ms
Wall time: 956 ms


In [22]:
tst.shape

(4, 23296, 23296)

In [None]:
canvas = ds.Canvas(plot_height=tst[1]/10,
                   plot_width=tst[2]/10
                  )
%time thin = canvas.raster(tst.sel(band=[1, 2, 3]))
thin

## Download

The strategy for download is:

1. Loop over UTM tiles
1. (Check and if not available) create subdirectory
1. Retrieve `.ovr`
1. Loop over each `.tif` file
1. (Check and if not available) download and place