<img src='./img/DataStore_EUMETSAT.png'/>

Copyright (c) 2024 EUMETSAT <br>
License: MIT

<hr>

<a href="./index.ipynb">← Index</a>
<br>
<a href="./3_Data_Tailor_standalone.ipynb">← Using the EUMETSAT Data Tailor Standalone with EUMDAC</a>

# The EUMDAC Cookbook

Welcome to the EUMDAC Cookbook, a curated collection of practical code examples designed to help users navigate the complexities of accessing and processing data from EUMETSAT's Data Access Services using the EUMDAC Python Library. This cookbook aims to empower researchers, developers, and analysts by providing them with a comprehensive guide to effectively utilize the EUMDAC library for their data needs.

## How to Use This Cookbook
Each recipe in this cookbook is designed to be self-contained, providing you with the necessary context, code, and explanations to understand and implement the solution effectively. You can navigate through them to find specific solutions or browse through the recipes to familiarize yourself with the library's capabilities.

## Table of Content
0. [Setup](#0.-Setup)
1. [Download products by daily time range](#1.-Download-products-by-daily-time-range)
    - [Adaption: Download only the first product of a defined hour each day](#1.1-Adaption:-Download-only-the-first-product-of-a-defined-hour-each-day)
2. [Verify downloaded products from Data Store](#2.-Verify-downloaded-products-from-Data-Store)
3. [Store original product names while customising the product](#3.-Store-original-product-names-while-customising-the-product)

## Recipes

### Setup

All our recipes need some general packages and need to authenticate to our Data Access Services APIs via user's personal consumer key and consumer secret. More info about the authentication can be found in the [Discovering Collections](./1_Discovering_collections.ipynb#Authentication) notebook within this repository.

In [None]:
import eumdac # library for the EUMETSAT Data Access Services
import datetime # to ease handling date formats
import shutil # used for downloading products to local disk

# Generate an access token with your personal Consumer Key and Consumer Secret
token = eumdac.AccessToken(('INSERT_CONSUMER_KEY_HERE', 'INSERT_CONSUMER_SECRET_HERE'))

# Access token is used to authenticate with our Data Store APIs
datastore = eumdac.DataStore(token)

# Access token is used to authenticate with our Data Tailor APIs
datatailor = eumdac.DataTailor(token)

### 1. Download products by daily time range

The script below searches for all HRSEVIRI products between two dates, but only downloads the products from defined hours, e.g. products from 08:00, 10:00 and 12:00.

In [None]:
# Define the collection ID you want to filter products in
collectionID = 'EO:EUM:DAT:MSG:HRSEVIRI'
selected_collection = datastore.get_collection(collectionID)

# Define the search parameters
start = datetime.datetime(2024, 1, 1) # Start date for product search
end = datetime.datetime(2024, 1, 3) # End date for product search
delta = datetime.timedelta(days=1) # Define steps to check between the start and end date, e.g. days=1 for check every day, hours=1 for every hour, weeks=2 for bi-weekly
hours = ['08','10','12'] # Define the hours per day you want to download products from, e.g. ['10','11'] to download all products of a day sensed between 10 and 12 AM.

total_products = 0

# Go through every day between the defined start and end date
for i in range((end - start).days):
    for hour in hours:
        # For every hour, defined in `hours`, we are doing a product search request to 
        # the Data Store OpenSearch API and using the `title` filter and filename pattern
        # to search for the desired daily time windows.
        date = start + i*delta
        products = selected_collection.search(
            dtstart=start,
            dtend=end,
            title=f"*-{(date).strftime('%Y%m%d')}{hour}****.*")
        
        total_products = total_products + products.total_results
        
        print(f"{products.total_results} products available for reguested daily time window ({hour}) on {date.day}/{date.month}/{date.year}.")
        
        # Download the found products
        for product in products:
            print(f"Start to download product {product} from {product.sensing_end}.")
            with product.open() as fsrc, \
                    open(fsrc.name, mode='wb') as fdst:
                shutil.copyfileobj(fsrc, fdst)
            print(f'Download of product {product} finished.')
        
print(f"{total_products} products were downloaded for requested daily time window ({', '.join(hours)}) from {start.day}/{start.month}/{start.year} to {end.day}/{end.month}/{end.year}.")

### 1.1 Adaption: Download only the first product of a defined hour each day

The script below searches for all HRSEVIRI products sensed in January 2024, but downloads only the first product, starting from 12:00 noon of each day. Meaning, this example downloads only one product per day. Additional hours can be added to `hours` for downloading the first product of the defined hours.

In [None]:
# Define the collection ID you want to filter products in
collectionID = 'EO:EUM:DAT:MSG:HRSEVIRI'
selected_collection = datastore.get_collection(collectionID)

# Define the search parameters
start = datetime.datetime(2024, 1, 1) # Start date for product search
end = datetime.datetime(2024, 2, 1) # End date for product search
delta = datetime.timedelta(days=1) # Define steps to check between the start and end date, e.g. days=1 for check every day, hours=1 for every hour, weeks=2 for bi-weekly
hours = ['12'] # Define the hours per day you want to download products from, e.g. ['10','12','14'] to download the first product from after 10:00, 12:00 and 14:00.

total_products = 0

# Go through every day between the defined start and end date
for i in range((end - start).days):
    for hour in hours:
        # For every hour, defined in `hours`, we are doing a product search request to 
        # the Data Store OpenSearch API and using the `title` filter and filename pattern
        # to search for all products sensed within 12:00 and sort them from oldest to newest.
        # With `.first()` we only got the first file in the list. Due to our sorting, 
        # it's the oldest one or the first sensed after 12:00.
        date = start + i*delta
        product = selected_collection.search(
            dtstart=start,
            dtend=end,
            title=f"*-{(date).strftime('%Y%m%d')}{hour}****.*",
            sort="start,time,1").first()
        
        total_products = total_products + 1
                
        # Download the found product
        print(f"Start to download product {product} from {product.sensing_end}.")
        with product.open() as fsrc, \
                open(fsrc.name, mode='wb') as fdst:
            shutil.copyfileobj(fsrc, fdst)
        print(f'Download of product {product} finished.')
        
print(f"{total_products} products were downloaded for requested daily time window ({', '.join(hours)}) from {start} to {end}.")

### 2. Verify downloaded products from Data Store

The script below provides a way to verify the downloaded files from Data Store via it's MD5 hash to check whether a downloaded file is complete or corrupted. This will be done by comparing the MD5 hash value of the downloaded file with the expected MD5 hash value provided by the Data Store API for each product.
If the check is failing, meaning the MD5 hash values are not equal, it shows that the file is corrupt. This can happen due to  download interruptions or network/connectivity issues, e.g. outages. When this happens, we recommend to download the file again.

In [None]:
# package to generate MD5 of existing file
import hashlib

# Define the collection ID you want to filter products in
collectionID = 'EO:EUM:DAT:MSG:HRSEVIRI'
selected_collection = datastore.get_collection(collectionID)

# Define the search parameters
start = datetime.datetime(2024, 1, 1) # Start date for product search
end = datetime.datetime(2024, 1, 1, 0, 15) # End date for product search

products = selected_collection.search(
    dtstart=start,
    dtend=end)

for product in products:
    print(f'Start download of product {product}...')
    with product.open() as fsrc, \
            open(fsrc.name, mode='wb') as fdst:
        shutil.copyfileobj(fsrc, fdst)
    print(f'Download of product {product} finished. Checking integrity...')
    
    expected_md5 = product.md5
    actual_md5 = hashlib.md5(open(str(product) + '.zip','rb').read()).hexdigest()
    
    if expected_md5 == actual_md5:
        print("MD5 integrity check done successfully.")
    else:
        print(f"File seems to be corrupted. MD5s are not equal.\nPlease, download product {product} again.")

### 3. Store original product names while customising the product

The standard procedure for naming a customized product involves using the customization ID as its default identifier. However, this practice may result in ambiguity, as it can be challenging for users to discern the original product from its customized counterpart. To address this issue, we have developed a script that effectively captures and retains the original product ID, typically the filename of the original product. This script then renames the customized product by incorporating the stored original product ID, thereby enhancing clarity and traceability in the product identification process.

In [None]:
import fnmatch

# Define the collection ID you want to filter products in
collectionID = 'EO:EUM:DAT:MSG:HRSEVIRI'
selected_collection = datastore.get_collection(collectionID)

# Define the search parameters
start = datetime.datetime(2024, 1, 1) # Start date for product search
end = datetime.datetime(2024, 1, 1, 0, 15) # End date for product search

products = selected_collection.search(
    dtstart=start,
    dtend=end)

# Defining the chain configuration
chain = eumdac.tailor_models.Chain(
    product='HRSEVIRI',
    format='geotiff',
    filter={"bands" : ["channel_3","channel_2","channel_1"]}
)

for product in products:
    # Save metadata from original product to variables
    original_name = str(product)
    sensing_start = product.sensing_start
    sensing_end = product.sensing_end

    # Define final product file name
    filename = original_name + ".tif"

    # Start customisation
    customisation = datatailor.new_customisation(product, chain)
    status = customisation.status
    sleep_time = 10 # seconds
    
    # Customisation Loop
    while status:
        # Get the status of the ongoing customisation
        status = customisation.status
    
        if "DONE" in status:
            print(f"Customisation {customisation._id} is successfully completed.")
            break
        elif status in ["ERROR","FAILED","DELETED","KILLED","INACTIVE"]:
            print(f"Customisation {customisation._id} was unsuccessful. Customisation log is printed.\n")
            print(customisation.logfile)
            break
        elif "QUEUED" in status:
            print(f"Customisation {customisation._id} is queued.")
        elif "RUNNING" in status:
            print(f"Customisation {customisation._id} is running.")
        time.sleep(sleep_time)


    # Download customised product
    jobID= customisation._id
    tif, = fnmatch.filter(customisation.outputs, '*.tif')
    print(f"Dowloading the TIF output of the customisation {jobID}")
    
    try:
        with customisation.stream_output(tif,) as stream, \
                open(filename, mode='wb') as fdst:
            shutil.copyfileobj(stream, fdst)
        print(f"Dowloaded successfully the TIF output of the customisation {jobID}: {fdst.name}")
    except requests.exceptions.RequestException as error:
        print(f"Unexpected error: {error}")

    # Delete customisation when it's downloaded
    try:
        customisation.delete()
    except requests.exceptions.RequestException as error:
        print("Unexpected error:", error)

<hr>

<a href="./index.ipynb">← Index</a>
<br>
<a href="./3_Data_Tailor_standalone.ipynb">← Using the EUMETSAT Data Tailor Standalone with EUMDAC</a>

<p style="text-align:left;">This project is licensed under the <a href="./LICENSE.txt">MIT License</a> | <span style="float:right;"><a href="https://gitlab.eumetsat.int/eumetlab/data-services/">View on GitLab</a> | <a href="https://training.eumetsat.int/">EUMETSAT Training</a> | <a href=mailto:ops@eumetsat.int>Contact</a></span></p>