In [4]:
%load_ext autoreload
%autoreload 2

%load_ext dotenv
%dotenv

# Download Sentinel-2 data

In this notebook we are going to use the EOTDL environment to download the Sentinel-2 imagery that will conform our dataset.

First of all, wee need our AoI bounding box and time interval in which download images. If you missed how we obtained them, go to the [00_exploration](00_exploration.ipynb) notebook.

Let's load the AoI bounding box.

In [5]:
import geopandas as gpd

boadella_bbox_gdf = gpd.read_file('data/boadella_bbox.geojson', crs='EPSG:4326')

boadella_bbox = list(boadella_bbox_gdf.geometry.total_bounds)
boadella_bbox

[2.792027806635944, 42.33057868499878, 2.838021549182864, 42.36457137143556]

And the range of dates.

In [6]:
import csv

dates = []
with open("data/dates.csv", "r") as file:
    reader = csv.reader(file)
    for row in reader:
        dates.append(row[0])
dates.sort()

dates[:5]

['2019-06-02', '2019-06-07', '2019-06-17', '2019-06-27', '2019-07-02']

As we have seen in the previous section, we need a Python dictionary with some parameters in order to download the imagery, such as the bounding box or the time interval. The fact is that we have the dates with available images as single days, and wee need them in a list as `(day, day)`.

In [7]:
time_interval = [(date, date) for date in dates[:5]]
time_interval[:5]

[('2019-06-02', '2019-06-02'),
 ('2019-06-07', '2019-06-07'),
 ('2019-06-17', '2019-06-17'),
 ('2019-06-27', '2019-06-27'),
 ('2019-07-02', '2019-07-02')]

Now we have it, we can create the parameters dict as we have seen.

In [8]:
boadella_download_dict = {
    'Boadella': {
        'bounding_box': boadella_bbox,
        'time_interval': time_interval
    }
}

We must connect to the [Sentinel Hub](https://www.sentinel-hub.com/) platform through the EOTDL client.

In [9]:
from os import getenv
from eotdl.access import SHClient

sh_client_id = getenv('SH_CLIENT_ID')
sh_client_secret = getenv('SH_CLIENT_SECRET')

client = SHClient(sh_client_id=sh_client_id, 
                  sh_client_secret=sh_client_secret)

Now, let's download the imagery!

Some explanation about the following code block. In order to download imagery through the Sentinel Hub API, we need to define some parameters, such as the Data Collection (in this workshop, `sentinel-2-l2a`), the EvalScript, the resolution and so on. In order to avoid this time-expensive definition, we have wrapped those parameters in a Python class like `sentinel_2_download_parameters`. With that, we just have to give the download dict with the bounding box and time interval and the folder where we want to download the data. And voilà! All it's managed by the EOTDL environment.

In [10]:
from eotdl.access import sentinel_2_download_parameters

sentinel_2_download_parameters.data_to_download = boadella_download_dict   # Give the dictionary with the data to download
sentinel_2_download_parameters.data_folder = 'data/sentinel_2'             # Give the folder where the data will be downloaded

process_requests = client.request_bulk_data(sentinel_2_download_parameters)

In [11]:
%%time

data = client.download_data(process_requests)

CPU times: user 261 ms, sys: 71.5 ms, total: 333 ms
Wall time: 13.2 s


That's all! We have downloaded the images for our dataset. Let's check them!

In [16]:
from glob import glob

rasters = glob('data/sentinel_2/*/*/*.tiff')
rasters[:5]

['data/sentinel_2/Boadella_2019-06-07/90839abb69fd0df08bcc798c4b210006/response.tiff',
 'data/sentinel_2/Boadella_2019-06-02/31065f0313941942895ce53ab4f9a2a3/response.tiff',
 'data/sentinel_2/Boadella_2019-07-02/073923a2d94447815cf3099b06816205/response.tiff',
 'data/sentinel_2/Boadella_2019-06-17/7aff418802685f538f0daab8ec5ef9b0/response.tiff',
 'data/sentinel_2/Boadella_2019-06-27/219cbe573a0e901311771f20bbe1c050/response.tiff']

It looks good!

## Format downloaded data

When we download imagery through the Sentinel Hub client, by default the EOTDL enviroment makes that every image is downloaded in a folder with nomenclature `<id>_<date>/<request_id>`. If we get one of the downloaded rasters path we can see it.

In [17]:
rasters[0]

'data/sentinel_2/Boadella_2019-06-07/90839abb69fd0df08bcc798c4b210006/response.tiff'

In order to maintain a logic structure and ensure that the dataset is diregible by the EO-TDL environment, we must make sure that the project structure is compatible and every image has an associated metadata file with necessary info about the image, which will be used later by the STAC generation. 

To do so, the EOTDL environment has a `Folder Formatter` that does exactly that: extract the images to a more human-readable folder structure, renamed with the constellation name as a label, and generates a `metadata.json` file from the `request.json`, with necessary metadata such as the acquisiton date of the image, the type, the bounding box, and so on.

Let's format the folder structure.

In [18]:
from eotdl.curation import SHFolderFormatter

formatter = SHFolderFormatter('data/sentinel_2')
formatter.root

formatter.format_folders()

Now, if we look again for the rasters paths, we will see that the folder structure is much more readable and nice.

In [19]:
rasters = glob('data/sentinel_2/**/*.tiff')
rasters[:5]

['data/sentinel_2/Boadella_2019-06-07/sentinel-2-l2a.tiff',
 'data/sentinel_2/Boadella_2019-06-02/sentinel-2-l2a.tiff',
 'data/sentinel_2/Boadella_2019-07-02/sentinel-2-l2a.tiff',
 'data/sentinel_2/Boadella_2019-06-17/sentinel-2-l2a.tiff',
 'data/sentinel_2/Boadella_2019-06-27/sentinel-2-l2a.tiff']

To sum up this section, we have downloaded the Sentinel-2 images that will conform our dataset through the Sentinel Hub client and have formated the folder structure to a much more readable format. With this, we have our [Q0 dataset](../00_eotdl.ipynb)!

Let's continue in the `02_stac` notebook and generate the STAC catalog!