# Download L2 matchup data from EUMETSAT using ThoMaS toolkit

**Last updated: 03/09/2024**

EUMETSAT's Data Store distributes **L1** and **L2** data from the **European Space Agency (ESA)** satellites, such as Sentinel-3. This script uses the conda environment `thomas` to download L2 satellite data of ocean colour, specifically chlorophyll, from the EUMETSAT Data Store. It then performs matchups betwen the satellite data and our choice of in situ chlorophyll measurements (here, obtained through HPLC analysis).

This script is specifically written to carry out the following tasks:
- Retrieve **OLCI L2 data** overpassing point locations as defined by a list of **in situ HPLC measurements**.
- Access **Sentinel-3A** and **Sentinel-3B** OLCI L2 data from of the latest collection OL__L2M.003 (Collection 3) in full resolution (**FR**).
- Submit matchup point locations using a **SeaBASS-formatted file**.
- Apply EUMETSAT's standard data extraction protocol for matchups, which includes a **window size of 3x3 pixels** and a **window time of +/- 2 hours** (as in Zibordi et al., 2009). 
- Generate and store the following outputs: S3 files (SatData), minifiles, extractions, extraction statistics and matchup statistics.
- Save all results and outputs related to the run at /path/to/data_matchups_thomas.

For additional guidance, refer to the comprehensive ThoMaS toolkit webinar by Juan Ignacio Gossn, available [here](https://training.eumetsat.int/course/view.php?id=468).

## Define download parameters, import libraries, functions and load credentials

In [6]:
# Main user parameter configuration area

# Parameters for downloading Sentinel-3 data
WINDOW_SIZE_LIST = [3, 3, 5, 5] # e.g., choice of 3 x 3 or 5 x 5
WINDOW_TIME_LIST = [7200, 10800, 7200, 10800] # in seconds (=2h, 3h, 2h, 3h)

# Download directories for our EUMETSAT products with names matching the configurations above
DOWNLOAD_DIR_LIST = [ 
    "data_matchups_thomas_3x3_2h",
    "data_matchups_thomas_3x3_3h",
    "data_matchups_thomas_5x5_2h",
    "data_matchups_thomas_5x5_3h"
]

# File containing our in situ observations (AR: created by my Matlab function prepareHPLCdata.m)
NAME_HPLC_DATA_FILE = 'cefasHPLCfiltered.csv' # store it in ./data/processed/

**Do not edit the code in the rest of this Section unless you know what you're doing!**

In [7]:
# Import the necessary libraries
import sys
import os
from pathlib import Path
import json
import eumdac
import pandas as pd
from datetime import datetime, timedelta

In [8]:
# Import ThoMaS toolkit
PATH_ROOT_DIR = Path.cwd().resolve().parents[1] # /absolute/path/to/two/levels/up
NAME_THOMAS_DIRECTORY = "ThoMaS"
full_path_thomas_directory = os.path.join(PATH_ROOT_DIR,"resources","external",NAME_THOMAS_DIRECTORY)
sys.path.insert(0, full_path_thomas_directory) # add ThoMaS directory to the system path
from main import ThoMaS_main

In [4]:
# Create download directories
download_dir_list_path = []
for download_directory in DOWNLOAD_DIR_LIST:
    download_dir_path = os.path.join(PATH_ROOT_DIR,"data","raw","EUMETSAT_L2_data",download_directory)
    download_dir_list_path.append(download_dir_path)
    
    # Create the directory at the specified path if it doesn't already exist
    if not os.path.exists(download_dir_path):
        os.makedirs(download_dir_path)
        print(f"Directory created: {download_dir_path}")

In [9]:
# Load EUMETSAT credentials
with open(os.path.join(os.path.expanduser("~"),'.eumdac_credentials')) as json_file:
    credentials = json.load(json_file)
    token = eumdac.AccessToken((credentials['consumer_key'], credentials['consumer_secret']))
    print(f"This token '{token}' expires {token.expiration}")

This token '66307d7e-4a71-3de1-bc2b-1cfd6874c6ab' expires 2024-09-03 11:10:21.594188


## Inspect ESA's OLCI products

In [15]:
datastore = eumdac.DataStore(token) # object that contains all the collections
for collection_id in datastore.collections: 
    try:
        # Filter the list for Sentinel-3 OLCI
        if "OLCI" in collection_id.title and "EO:EUM:DAT:0" in str(collection_id): 
            print(f"Collection ID({collection_id}): {collection_id.title}")
    except:
        pass

Collection ID(EO:EUM:DAT:0410): OLCI Level 1B Reduced Resolution - Sentinel-3
Collection ID(EO:EUM:DAT:0408): OLCI Level 2 Ocean Colour Reduced Resolution - Sentinel-3
Collection ID(EO:EUM:DAT:0409): OLCI Level 1B Full Resolution - Sentinel-3
Collection ID(EO:EUM:DAT:0407): OLCI Level 2 Ocean Colour Full Resolution - Sentinel-3
Collection ID(EO:EUM:DAT:0556): OLCI Level 2 Ocean Colour Full Resolution (version BC003) - Sentinel-3 - Reprocessed
Collection ID(EO:EUM:DAT:0557): OLCI Level 2 Ocean Colour Reduced Resolution (version BC003) - Sentinel-3 - Reprocessed
Collection ID(EO:EUM:DAT:0577): OLCI Level 1B Full Resolution (version BC002) - Sentinel-3 - Reprocessed
Collection ID(EO:EUM:DAT:0578): OLCI Level 1B Reduced Resolution (version BC002) - Sentinel-3 - Reprocessed


We want two reprocessed products:
- **EO:EUM:DAT:0556**: OLCI L2 FR (300 m) reprocessing BC003
- **EO:EUM:DAT:0557**: OLCI L2 LR (1.2 km) reprocessing BC003

## Read in our situ HPLC observations

**You might need to modify the code in this Section based on the configuration of your in situ observations file.**

In [7]:
full_path_hplc_data_dir = os.path.join(PATH_ROOT_DIR,"data","processed",NAME_HPLC_DATA_FILE)
matchup_locations_list = pd.read_csv(full_path_hplc_data_dir, sep = ',')
matchup_locations_list

Unnamed: 0,idd,Survey_name,Station_number,Prime_number,DateTime,Latitude,Longitude,Smartbuoy,Sample_depth,TP_ug_L,...,Lut_ug_L,Myxo_ug_L,Croc_ug_L,x19_Keto_Hex_fuco_ug_L,Hexkfuco_ug_L,HexkfucoL_ug_L,x4keto_hex_ug_L,x4keto_hexL_ug_L,bathymetry_m,season
0,625,CEND19_17,230.0,102.0,28-Oct-2017 01:18:00,48.350783,-5.750117,,4,0.701378,...,0.000000,0.000000,0.00000,,,,0.000000,,-116.982548,Autumn
1,671,CEND17_18,169.0,102.0,25-Oct-2018 03:42:00,48.366280,-5.725660,,6,0.796561,...,0.000000,,0.00000,,,,0.004086,,-116.179468,Autumn
2,672,CEND17_18,176.0,106.0,25-Oct-2018 20:40:00,48.547920,-4.928820,,6,0.628834,...,0.000662,,0.00000,,,,0.002824,,-74.689037,Autumn
3,626,CEND19_17,231.0,106.0,28-Oct-2017 05:48:00,48.552300,-4.915950,,4,0.858928,...,0.000000,0.000000,0.00000,,,,0.000000,,-77.795798,Autumn
4,55,CEND09_11,44.0,,22-May-2011 15:50:00,48.778933,-4.390117,,5,1.906309,...,0.004340,0.003500,0.00701,,,,,,-88.030648,Spring
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
669,499,CEND15_13,191.0,72.0,29-Aug-2013 12:40:00,61.232000,-0.401000,,4,0.620383,...,0.001790,0.000000,,0.00000,0.0,0.0,,,-164.470400,Summer
670,166,CEND13_10,160.0,74.0,01-Sep-2010 04:06:00,61.251333,1.393667,,3,6.715475,...,0.024246,0.013840,,0.11529,,,,,-148.207900,Summer
671,445,CEND18_15,125.0,73.0,30-Aug-2015 14:47:00,61.285900,0.488133,,4,2.892239,...,0.014518,0.000000,0.00000,,,,0.016483,,-167.815296,Summer
672,533,CEND18_16,120.0,73.0,29-Aug-2016 09:35:00,61.288283,0.500183,,4,2.378788,...,0.001587,0.006305,0.00000,,,,0.000000,,-168.083801,Summer


In [8]:
# Extract the data that we need from the HPLC observations file
LIST_OBS_LON = matchup_locations_list.Longitude[:]
LIST_OBS_LAT = matchup_locations_list.Latitude[:]
LIST_OBS_DATETIME = matchup_locations_list.DateTime[:]
LIST_OBS_CHLA = matchup_locations_list.TChlA_ug_L[:]

# Change the date time format as required for the SeaBASS file
LIST_OBS_DATETIME = pd.to_datetime(LIST_OBS_DATETIME)
LIST_OBS_DATE = LIST_OBS_DATETIME.dt.strftime("%Y%m%d")
LIST_OBS_TIME = LIST_OBS_DATETIME.dt.strftime("%H:%M:%S")

## Create a SeaBASS-formatted file called `matchup_entries.csv` to submit our observations

This Section builds the SeaBASS-formatted file. I learned how to build it using resources available [here](ThoMaS/examples/example7). Fields for the header are specified in the [IDB_config file](ThoMaS/IDB_config/IDB_config). At the moment this SeaBASS-formatted file is written for my specific purpose. **Currently, this SeaBASS-formatted file is tailored to my specific needs, but you can use it as a template to develop your own version.**

In [9]:
# Build the SeaBASS-formatted file

FILENAME_SEABASS = "matchup_entries.csv" # has to have the suffix .csv

seabass_file_list_path = []

for download_directory_path in download_dir_list_path:
    
    seabass_file_path = os.path.join(download_directory_path,FILENAME_SEABASS)
    seabass_file_list_path.append(seabass_file_path)

    # Data matrix component
    data_matrix = []
    for i in range(len(LIST_OBS_LAT)): 
        date = LIST_OBS_DATE[i]
        time = LIST_OBS_TIME[i]
        lat = LIST_OBS_LAT[i]
        lon = LIST_OBS_LON[i]
        chla = LIST_OBS_CHLA[i]
        data_matrix.append((date, time, lat, lon, chla))

    if os.path.exists(seabass_file_path):
        os.remove(seabass_file_path)
    
    with open(seabass_file_path,'w') as file:
    
        # Write the SeaBASS header
        # Fields for the header are specified in the IDB_config file
        # I've removed "file.write("/measurement_depth=0\n")" as it's not in the IDB_config
        file.write("/begin_header\n")
        file.write("/investigators=Anna_Rufas\n")
        file.write("/affiliations=University_of_Oxford\n")
        file.write("/contact=anna.rufasblanco@earth.ox.ac.uk\n")
        file.write("/experiment=a_cefas_experiment\n")
        file.write("/cruise=a_cefas_cruise\n")
        file.write("/station=NA\n")
        file.write("/data_file_name=matchup_entries.csv\n")
        file.write("/documents=unprovided_document\n")
        file.write("/data_type=pigment\n")
        file.write("/calibration_files=unprovided_calibration_file\n")
        file.write("/start_date=20100101\n")
        file.write("/end_date=20191101\n")
        file.write("/start_time=00:00:00[GMT]\n")
        file.write("/end_time=00:00:00[GMT]\n")
        file.write("/north_latitude=62.0[DEG]\n")
        file.write("/south_latitude=48.0[DEG]\n")
        file.write("/east_longitude=9.0[DEG]\n")
        file.write("/west_longitude=-11.0[DEG]\n")
        file.write("/water_depth=NA\n")
        file.write("/missing=-9999\n")
        file.write("/delimiter=comma\n")
        file.write("/fields=date,time,lat,lon,Chl-a\n")
        file.write("/units=yyyymmdd,hh:mm:ss,degrees,degrees,mg/m^3\n")
        file.write("/end_header\n")
    
        # Write the data matrix
        for row in data_matrix:
            file.write(",".join(map(str, row)) + "\n")

## Build the file `config_file.ini` to run ThoMaS

The configuration file will tell ThoMaS how to run the matchup analysis. It applies by default EUMETSAT's matchup protocol but the user can define their own: extraction statistics methods, window size, time tolerance and relevant statistics. Window size and time tolerance depend on the variability of the waters. In complex waters, reduce window size to 3 x 3 pixels. This will reduce the number of pixels to compute statistics. If it is homogeneous waters, set a 5 x 5 pixel window, where mroe pixels mean that the statistical significance to compute statistics is higher. In complex waters, reduce the time tolerance too. Zibordi et al. (2009) use a time tolerance of 2 h.

The structure of `config_file.ini` contains the mandatory sections `[global]` and `[workflow]` and optional sections, as shown below. **Currently, this `config_file.ini` is tailored to my specific needs, but you can use it as a template to develop your own version.**

In [10]:
FILENAME_CONFIG_FILE = "config_file.ini"

config_file_list_path = []

for download_directory_path, seabass_file_path, window_size, window_time in zip(download_dir_list_path, seabass_file_list_path, WINDOW_SIZE_LIST, WINDOW_TIME_LIST):
    
    config_file_path = os.path.join(download_directory_path,FILENAME_CONFIG_FILE)
    config_file_list_path.append(config_file_path)
    
    # Each of the following sections have their own README file in ./code/internal/ThoMaS
    config_params = {}
    
    # "global" section
    config_params['global'] = {}
    config_params['global']['path_output'] = download_directory_path
    config_params['global']['SetID'] = 'Endurance' # choose a name that reflects your data
    config_params['global']['workflow'] = 'insitu, SatData, minifiles, EDB, MDB'
    
    # "insitu" section
    config_params['insitu'] = {}
    config_params['insitu']['insitu_input'] = seabass_file_path
    config_params['insitu']['insitu_satelliteTimeToleranceSeconds'] = window_time # seconds
    
    # "satellite" section
    # SatData will contain the satellite granules (images), which are larger than the specified window 
    # size. Each SatData file contains different bands/products.
    config_params['satellite'] = {}
    config_params['satellite']['satellite_path-to-SatData'] = os.path.join(download_directory_path,'SatData')
    config_params['satellite']['satellite_source'] = 'EUMETSATdataStore'
    config_params['satellite']['satellite_collections'] = 'OL__L2M.003'
    config_params['satellite']['satellite_platforms'] = 'S3A, S3B' 
    config_params['satellite']['satellite_resolutions'] = 'FR' # NOTE: if searching for FR, don't choose RR (search already included in FR)
    
    # "minifiles" section (optional)
    # Extract minifiles from the listed/downloaded SatData centred at the queried lat/lon pairs
    # In this step, a minimisation of the distance to our desired lat/lon ranges is applied by taking 
    # the window size of our choice. In this step, the different SatData files (including different 
    # bands/products) are grouped.
    config_params['minifiles'] = {}
    config_params['minifiles']['minifiles_winSize'] = window_size
    
    # "EDB" section (optional)
    # Extraction Data Base (EDB) file compiling all extractions and statistics
    # This step stacks minifiles and calculates statistics of extraction following EUMETSAT's 
    # protocol. The EDB folder contains the .nc and .csv final files.
    config_params['EDB'] = {}
    config_params['EDB']['EDB_protocols_L2'] = 'EUMETSAT_standard_L2'
    config_params['EDB']['EDB_winSizes'] = window_size
    
    # "MDB" section (optional)
    # Matchups Data Base (MDB) file compiling all insitu-satellite matchup pairs and statistics
    # This step provides matchup statistics following EUMETSAT's protocol. This considers whether 
    # there are pixels in the window size that are non-valid (flagged), it checks for outlier pixels 
    # (anomalous values, based on mean and standard deviation), and other considerations.
    config_params['MDB'] = {}
    config_params['MDB']['MDB_time-interpolation'] = 'noTimeInterp'
    config_params['MDB']['MDB_stats_plots'] = True
    config_params['MDB']['MDB_stats_protocol'] = 'EUMETSAT_standard_L2'

    # Write config_params sections into config_file.ini
    def write_config_file(config_file_path,config_params):
        if os.path.exists(config_file_path):
            os.remove(config_file_path)
        with open(config_file_path, 'w') as text_file:
            for section,section_params in config_params.items():
                text_file.write('\n[%s]\n' % (section))
                for param, value in section_params.items():
                    text_file.write('%s: %s\n' % (param, value))

    write_config_file(config_file_path,config_params)

## Run ThoMaS

In [None]:
%%time

for config_file_path in config_file_list_path:
    ThoMaS_main(config_file_path)

Building satellite_datasets file from specified options in config_file...

Step insitu
Creating IDB (in situ database) netcdf file and calculating timeRanges and satellite datasets from input SeaBASS/OCDB file (in situ)
creating IDB: Endurance
BRDF correction not required.

Step SatData
Downloading and/or creating lists of satellite data
S3A_OL_2_WFR____20180408T102022_20180408T102322_20210812T043022_0179_030_008______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20180407T104633_20180407T104933_20210812T163734_0180_029_379______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20170318T102657_20170318T102857_20210712T143732_0119_015_279______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20180411T104248_20180411T104548_20210812T051424_0179_030_051______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20180319T103904_20180319T104204_20210721T165151_0180_029_108______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20170429T103756_20170429T103956_20

In [11]:
%%time

ThoMaS_main(config_file_list_path[3])

Building satellite_datasets file from specified options in config_file...

Step insitu
Creating IDB (in situ database) netcdf file and calculating timeRanges and satellite datasets from input SeaBASS/OCDB file (in situ)
creating IDB: Endurance
BRDF correction not required.

Step SatData
Downloading and/or creating lists of satellite data
S3A_OL_2_WFR____20180408T102022_20180408T102322_20210812T043022_0179_030_008______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20180407T104633_20180407T104933_20210812T163734_0180_029_379______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20170318T102657_20170318T102857_20210712T143732_0119_015_279______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20180411T104248_20180411T104548_20210812T051424_0179_030_051______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20180324T100908_20180324T101208_20210721T201627_0179_029_179______MAR_R_NT_003.SEN3: Downloading product!
S3A_OL_2_WFR____20180319T103904_20180319T104204_20