![](https://github.com/destination-earth/DestinE-DataLake-Lab/blob/main/img/DestinE-banner.jpg?raw=true)



# DEDL - HDA Tutorial - Queryables

<br> Author: EUMETSAT </br>

<div class="alert alert-block alert-success">
<h3>How to use the queryables API</h3>
The queryables API returns a list of variable terms that can be used for filtering the specified collection.     

This notebook demonstrates how to filter data in a specific collection using the list of variable terms returned by the queryables API. 
</div>

Throughout this notebook, you will learn:

1. [Authenticate](#Authenticate): How to authenticate for searching and access *DEDL* collections. 
2. [Queryables](#Queryables): How to exploit the STAC API filter extension features. The "queryables" API helps users to determine the property names and types available for filtering data.
3. [Search data](#Search):  How to search *DEDL* data using filters obtained by the "queryables" API.
4. [Download data](#Download): How to download *DEDL* data through HDA.


The detailed HDA API and definition of each endpoint and parameters is available in the HDA Swagger UI at: 
[ STAC API - Filter Extension ](https://hda.data.destination-earth.eu/docs/#/STAC%20API%20-%20Filter%20Extension)

<div class="alert alert-block alert-warning">
<b> Prequisites: </b>
<li> For queryables API: none </li>
<li> For filtering data inside collections : <a href="https://platform.destine.eu/"> DestinE user account</a> </li>
</div>

# Authenticate
## Define some constants for the API URLs
In this section, we define the relevant constants, holding the URL strings for the different endpoints.

In [1]:
# Collection https://hda.data.destination-earth.eu/ui/dataset/EO.ECMWF.DAT.CAMS_EUROPE_AIR_QUALITY_FORECASTS as default
COLLECTION_ID = "EO.ECMWF.DAT.CAMS_EUROPE_AIR_QUALITY_FORECASTS"

# Core API
HDA_API_URL = "https://hda.data.destination-earth.eu"

# STAC API
## Core
STAC_API_URL = f"{HDA_API_URL}/stac"

## Item Search
SEARCH_URL = f"{STAC_API_URL}/search"

## Collections
COLLECTIONS_URL = f"{STAC_API_URL}/collections"

## Queryables
QUERYABLES_URL = f"{STAC_API_URL}/queryables"
QUERYABLES_BY_COLLECTION_ID = f"{COLLECTIONS_URL}/{COLLECTION_ID}/queryables"
HDA_FILTERS =''

## HTTP Success
HTTP_SUCCESS_CODE = 200

## Import the relevant modules and define some functions
We start off by importing the relevant modules for DestnE authentication, HTTP requests, json handling and widgets.

In [2]:
import destinelab as deauth

In [3]:
import requests
import json
from getpass import getpass

import ipywidgets as widgets
from IPython.display import display, clear_output, HTML
from ipywidgets import Layout, Box
import datetime

from rich.console import Console
import rich.table

from IPython.display import JSON


Below useful functions for pretty printing and for demonstrating the queryables API.

In [4]:
# Function to display queryables in a table
def create_q_table(filters,params=None):
    table = rich.table.Table(title="Applicable filters", expand=True, show_lines=True)
    table.add_column("Description", style="cyan", justify="right")
    table.add_column("Type", style="violet", justify="right", no_wrap=True)
    table.add_column("enum", style="violet", justify="right")
    table.add_column("value", style="violet", justify="right", no_wrap=True)
    for filtername in filters.keys():
        if (params!=None and filtername not in params.keys()):
            continue
        enum=''
        if 'enum' in filters[filtername]:
           enum=' , ' .join(map(str,filters[filtername]["enum"]))
        value=''
        if'value' in filters[filtername]:
           value=json.dumps(filters[filtername]["value"])
        if'type' in filters[filtername]:
           typeq=json.dumps(filters[filtername]["type"])
        else:
            typeq=''
    
        table.add_row(filters[filtername]["description"],  typeq , enum, value)
    return table

# Function to fetch queryable properties for a given collection with optional params
def fetch_queryables(collection_name, params=None, complete= False):
    url = f"{COLLECTIONS_URL}/{collection_name}/queryables"
    response = requests.get(url, params=params)
    if response.status_code == 200 and complete == False:
        return response.json().get('properties', {})
    elif response.status_code == 200 and complete == True:
        return response.json()
    else:
        return None

    
def update_dropdowns(collection_name, params=None):
    properties = fetch_queryables(collection_name, params)
    global HDA_FILTERS
    
    with output_area:
        clear_output()
        if properties:
            print("Properties fetched successfully.")
            #print(json.dumps(properties, indent=2))
            table=create_q_table(properties, params)
            console = Console()
            console.print(table)
            if (params!=None):
                print("The parameters chosen can be translated in the following filters for the HDA query. \n" )
                cleaned_params = {k: v for k, v in params.items() if v}
                HDA_FILTERS = {
                    key: {"eq": value}
                    for key, value in cleaned_params.items()
                }
                print(json.dumps(HDA_FILTERS, indent=4))
                print("\n For the non selected parameters the default values, visible in the column 'value' above, will be applied by default by HDA.")
        else:
            print("Failed to fetch properties.")
            return
        
    # Preserve existing selected values
    selected_values = {prop: dropdown.value for prop, dropdown in dropdowns.items()}
    
    # Clear existing dropdowns
    dropdown_container.children = []
    dropdowns.clear()
    
    # Create new dropdowns for properties with enum values
    new_dropdowns = []
    for prop, details in properties.items():
        if details.get('type') == 'string' and 'enum' in details:
            options = details['enum']
            #if prop == 'levtype':
            options = [''] + options  # Add empty option for 'param' property

            dropdown = widgets.Dropdown(
                description=prop,
                options=options,
                value=selected_values.get(prop, options[0])  # Set previously selected value or default to the first option
            )
            dropdown.observe(on_value_change, names='value')
            dropdowns[prop] = dropdown
            new_dropdowns.append(dropdown)
                
    if new_dropdowns:
        dropdown_container.children = new_dropdowns
    else:
        with output_area:
            print("No properties with enum values found.")

def on_fetch_button_clicked(b):
    collection_name = COLLECTION_ID    
    update_dropdowns(collection_name)

def on_value_change(change):
    collection_name = COLLECTION_ID
    params = {prop: dropdown.value for prop, dropdown in dropdowns.items() if dropdown.value is not None}
    
    if params:
        details = fetch_queryables(collection_name, params, complete = True)
    
        with output_area:
            clear_output()
            #print(json.dumps(details, indent=2))
    
    # Update dropdowns based on the new selection
    update_dropdowns(collection_name, params) 
    



### Obtain Authentication Token
To perform a query on HDA we need to be authenticated.

In [5]:
DESP_USERNAME = input("Please input your DESP username or email: ")
DESP_PASSWORD = getpass("Please input your DESP password: ")

auth = deauth.AuthHandler(DESP_USERNAME, DESP_PASSWORD)
access_token = auth.get_token()
if access_token is not None:
    print("DEDL/DESP Access Token Obtained Successfully")
else:
    print("Failed to Obtain DEDL/DESP Access Token")

auth_headers = {"Authorization": f"Bearer {access_token}"}

Please input your DESP username or email:  eum-dedl-user
Please input your DESP password:  ········


Response code: 200
DEDL/DESP Access Token Obtained Successfully


# Queryables

The "queryables" API helps users to determine the property names and types available for filtering data inside a specific collection.

Below a dropdown menu to choose the desired collection.

In [6]:
# Event listeners
def on_change(change):
    with output_area:
        clear_output()
        print(f'Selected: {change["new"]}')
        print('---------------------------------------------')
        delimiter=''
        global COLLECTION_ID, QUERYABLES_BY_COLLECTION_ID
        COLLECTION_ID = delimiter.join(change["new"])
        QUERYABLES_BY_COLLECTION_ID = f"{COLLECTIONS_URL}/{COLLECTION_ID}/queryables"
        
        product_types = requests.get(COLLECTIONS_URL).json()['collections'] 
        index = next((i for i, d in enumerate(product_types) if d.get('id') == COLLECTION_ID), None)
        
        print("TITLE: "+product_types[index]['title'])
        print("DESCRIPTION: "+product_types[index]['description'])
        print("\nQUERYABLES ENDPOINT: \n"+QUERYABLES_BY_COLLECTION_ID)

options = [product_type["id"] for product_type in requests.get(COLLECTIONS_URL).json()['collections']]

# Widgets
output_area = widgets.Output()

dropdown = widgets.Dropdown(
    options=options,
    value=options[0],
    description="Collections:",
    disabled=False,
) 
dropdown.observe(on_change, names='value')


# Layout

# Define the layout for the dropdown
dropdown_layout = Layout(display='space-between', justify_content='center', width='90%')
# Create a box to hold the dropdown with the specified layout
box = Box([dropdown, output_area], layout=dropdown_layout)
display( box)  




Box(children=(Dropdown(description='Collections:', options=('EO.CLMS.DAT.CORINE', 'EO.CLMS.DAT.GLO.DMP300_V1',…

The **QUERYABLES ENDPOINT** (above) shows the applicable filters under the section named 'properties'.

The 'properties' section contains 
* the name of the filter, **description**,
* the filter **type**, 
* the possible filter values, **enum** (conditioned by the values selected for the other filters)
* and the the default (or chosen) **value** applied

We can print the'properties' section for the selected collection in the table below.
The table shows the filters and the values applied by default when we perform a search for the chosen dataset.

In [7]:
filters_resp=requests.get(QUERYABLES_BY_COLLECTION_ID)
filters = filters_resp.json()["properties"]
table=create_q_table(filters)
console = Console()
console.print(table)

Calling the queryables API, using as parameters the values chosen for filtering the selected dataset, the API replies with the applicable filters, conditioned by the chosen values. This means that, if the user selects a certain variable then the choice is narrowed down for other variables.

The queryables API., in this way, helps user to build a correct search request for the given dataset.
Below an interactive example to see that once you select a value for a property the choice is narrowed down for other variables. 

In [8]:
print("Chosen collection: "+COLLECTION_ID)

fetch_button = widgets.Button(description="Fetch queryables")
dropdown_container = widgets.VBox()
output_area = widgets.Output()

dropdowns = {}

# Event listeners
fetch_button.on_click(on_fetch_button_clicked)

# Layout
display(fetch_button, dropdown_container, output_area)

Chosen collection: EO.ECMWF.DAT.SEASONAL_FORECAST_MONTHLY_STATISTICS_ON_PRESSURE_LEVELS_2017_PRESENT


Button(description='Fetch queryables', style=ButtonStyle())

VBox()

Output()

## Filtering a collection with the list returned by the queryable API

This section wil explain how to use the list of variable terms returned by the queryables API for filtering a specific dataset. 

### Build the query from the selected values
The parameters chosen in the previous steps can be used to build the corresponding HDA queries.

In [9]:
# The JSON objects containing the generic query parameters:
json1 = '{"collections": ["EO.ECMWF.DAT.CAMS_EUROPE_AIR_QUALITY_FORECASTS"], "datetime": "2024-04-01T00:00:00Z/2024-04-19T00:00:00Z"}'

# Convert JSON strings to Python dictionaries
dict1 = json.loads(json1)

# Include the filters selected in the previous steps inside the JSON containing the generic query parameters:
dict1['query'] = HDA_FILTERS

# Convert the merged dictionary back to a JSON string
query_json = json.dumps(dict1, indent=4)

print(query_json)

{
    "collections": [
        "EO.ECMWF.DAT.CAMS_EUROPE_AIR_QUALITY_FORECASTS"
    ],
    "datetime": "2024-04-01T00:00:00Z/2024-04-19T00:00:00Z",
    "query": {
        "system": {
            "eq": "51"
        },
        "variable": {
            "eq": "temperature"
        },
        "pressure_level": {
            "eq": "30"
        },
        "leadtime_month": {
            "eq": "5"
        }
    }
}


## Search

In [10]:
response = requests.post("https://hda.data.destination-earth.eu/stac/search", headers=auth_headers, json= json.loads(query_json) )
#print(response)
# Requests to ADS data always return a single item containing all the requested data
product = response.json()["features"][0]
JSON(product, expanded= False)

<IPython.core.display.JSON object>

# Download
Once we have found the product we can obtain the URL for downloading it:


In [11]:
# DownloadLink is an asset representing the whole product
download_url = product["assets"]["downloadLink"]["href"]
print(download_url )

https://hda.data.destination-earth.eu/stac/collections/EO.ECMWF.DAT.CAMS_EUROPE_AIR_QUALITY_FORECASTS/items/CAMS_EU_AIR_QUALITY_FORECAST_20240401_20240418_d0b559c3056351d839571264db5ebdcef9a4deaa/download?provider=copernicus_atmosphere_data_store&_dc_qs=%257B%2522date%2522%253A%2B%25222024-04-01%252F2024-04-18%2522%252C%2B%2522format%2522%253A%2B%2522grib%2522%252C%2B%2522leadtime_hour%2522%253A%2B0%252C%2B%2522level%2522%253A%2B0%252C%2B%2522model%2522%253A%2B%2522ensemble%2522%252C%2B%2522pressure_level%2522%253A%2B30%252C%2B%2522system%2522%253A%2B51%252C%2B%2522time%2522%253A%2B%252200%253A00%2522%252C%2B%2522type%2522%253A%2B%2522forecast%2522%252C%2B%2522variable%2522%253A%2B%2522temperature%2522%257D
