<img src='./img/DataStore_EUMETSAT.png'/>

Copyright (c) 2024 EUMETSAT <br>
License: MIT

<hr>

<a href="./index.ipynb">← Index</a>
<br>
<a href="./1_1_Discovering_collections.ipynb">← Discovering collections</a>&nbsp;&nbsp;&nbsp;<span style="float:right;"><a href="./1_3_Downloading_products.ipynb">Downloading products →</a>

# Searching and filtering products in a collection

## What will this module teach you?

This module will show you how to:<br>


1. **Filter Datasets**
2. **How to Use EUMDAC for Searching Products**
 - Filter by the Latest Product
 - Filter Products by (Rectangular) Area of Interest
 - Filter Products by (Polygonal) Area of Interest
 - Filter Products by Sensing Time
 - Filter Products by Publication Time
 
3. **List Found Products**

## 1. Filter Datasets
In this section we demonstrate the retrieval of datasets from a collection by applying filtering parameters.

We begin as before by importing our required modules. Notice that we are using the EUMDAC library, a Python interface from EUMETSAT to handle requests and responses of the APIs. We will explain the use of this library in this tutorial.

**We need to install the library first!** An installation guide and further information about the usage of EUMDAC we will find here: https://user.eumetsat.int/resources/user-guides/eumetsat-data-access-client-eumdac-guide

In [2]:
import eumdac
from IPython.core.display import HTML
import datetime
import requests

Now, we have to authorize with our personal credentials to generate the token.

<div class="alert alert-block alert-success">
<b>NOTE:</b><br />
You can find your personal API credentials here: <a href="https://api.eumetsat.int/api-key/">https://api.eumetsat.int/api-key/</a>
</div>

In [3]:
# Insert your personal key and secret into the single quotes

consumer_key = 'YOUR_CONSUMER_KEY'
consumer_secret =  'YOUR_CONSUMER_SECRET'

credentials = (consumer_key, consumer_secret)

token = eumdac.AccessToken(credentials)

try:
    print(f"This token '{token}' expires {token.expiration}")
except requests.exceptions.HTTPError as exc:
    print(f"Error when tryng the request to the server: '{exc}'")

This token '6951c87f-951c-3de5-bb5d-dbb366abd219' expires 2024-02-20 14:00:06.909733


Before we start, we have to select a collection we want to browse through. For more information on determining which collections are available, see the previous tutorial, <a href="./1_Discovering_collections.ipynb">Discovering collections</a>

In [4]:
datastore = eumdac.DataStore(token)
selected_collection = datastore.get_collection('EO:EUM:DAT:METOP:IASSND02')

## 2. How to use EUMDAC for searcing products

Different collections can offer different search options. To know by which parameters we can define our search, we can call the following function:

In [5]:
try:
    display(selected_collection.search_options)
except eumdac.datastore.DataStoreError as error:
    print(f"Error related to the data store: '{error.msg}'")
except eumdac.collection.CollectionError as error:
    print(f"Error related to the collection: '{error.msg}'")
except requests.exceptions.RequestException as error:
    print(f"Unexpected error: {error}")

{'bbox': {'title': 'Inventory which has a spatial extent overlapping this bounding box',
  'options': []},
 'geo': {'title': 'Inventory which has a spatial extent overlapping this Well Known Text geometry',
  'options': []},
 'title': {'title': 'Can be used to define a wildcard search on the product title (product identifier), use set notation as OR and space as AND operator between multiple search terms',
  'options': [None]},
 'sat': {'title': 'Mission / Satellite',
  'options': ['Metop-A', 'Metop-B', 'Metop-C']},
 'type': {'title': 'Product Type', 'options': ['IASSND02']},
 'dtstart': {'title': 'Temporal Start', 'options': []},
 'dtend': {'title': 'Temporal End', 'options': []},
 'publication': {'title': 'publication date', 'options': []},
 'zone': {'title': 'Equi7grid main continental zone',
  'options': ['NA', 'AN', 'OC', 'AS', 'EU', 'SA', 'AF']},
 't6': {'title': 'Equi7grid 600km tile', 'options': []},
 'orbit': {'title': 'Orbit Number, must be a positive integer', 'options': []}

As an answer we get all parameters we can use for our search including some specifications to the parameters. This includes start and end time, spatial extent, satellites and many more.

<div class="alert alert-block alert-success">
<b>NOTE:</b><br />
Find more information about EUMDAC errors, their causes and possible solutions, in our knowledge base: <a href="https://user.eumetsat.int/resources/user-guides/eumetsat-data-access-client-eumdac-guide#ID-Exception-handling">https://user.eumetsat.int/resources/user-guides/eumetsat-data-access-client-eumdac-guide#ID-Exception-handling</a>
</div>

### - Filter by the Latest Product

The most simple case for a search is to get the latest product of our selected collection. We just need two lines of code to do this:

In [5]:
latest = selected_collection.search().first()

try:
    print(latest)
except eumdac.collection.CollectionError as error:
    print(f"Error related to the collection: '{error.msg}'")
except requests.exceptions.RequestException as error:
    print(f"Unexpected error: {error}")

IASI_SND_02_M01_20240220084754Z_20240220102953Z_N_O_20240220095048Z


The `first()` function returns the first object. The products of a collection are sorted by date and time in descending order. So this means that the `first()` function will give us the latest product of our selected collection.

### - Filter by the Satellite Type

The cell below shows, how to filter products for a satellite type.

**Parameters**
- **sat**: Mission / Satellite

In [6]:
satellite_type = "Metop-C"
products = selected_collection.search(sat=satellite_type)

try:
    print(f'Found Datasets: {products.total_results} datasets for the given satellite type.')
except eumdac.collection.CollectionError as error:
    print(f"Error related to the collection: '{error.msg}'")
except requests.exceptions.RequestException as error:
    print(f"Unexpected error: {error}")

Found Datasets: 22026 datasets for the given satellite type.


### - Filter Datasets by (Rectangular) Area of Interest

What if we want to refine our search to only cover a given area? In the cell below we search for products in our selected collection within a given geospatial rectangle. The rectangle is defined by two coordinates which represent its two opposing corners (bottom left, top right).

**Parameters**
- **bbox**: Corner points of rectangular geographical area of interest using EPSG:4326 decimal degrees<br>(e.g. bbox=2.0,10.0,10.0,52.0)

In [7]:
# Set bounding-box coordinates
area = '-11.78, 50.95, -2.78, 58.41'
# Retrieve datasets that match our filter
products = selected_collection.search(bbox=area)

In [8]:
try:
    print(f'Found Datasets: {products.total_results} datasets for the given area of interest')
except eumdac.collection.CollectionError as error:
    print(f"Error related to the collection: '{error.msg}'")
except requests.exceptions.RequestException as error:
    print(f"Unexpected error: {error}")

Found Datasets: 38841 datasets for the given area of interest


### - Filter Datasets by (Polygonal) Area of Interest
We can do the same with with a custom geometry. In the cell below we search for products for our selected collection within a given geospatial polygon.<br>The polygon is defined by multiple coordinates which represent the corners of its shape.

**Parameters**
- **geo**: a custom geomtery in [Well Known Text format](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry), using EPSG:4326 decimal degrees

In [7]:
# Add vertices for polygon, wrapping back to the start point.
geometry = "POLYGON((-1.0 -1.0,4.0 -4.0,8.0 -2.0,9.0 2.0,6.0 4.0,1.0 5.0,-1.0 -1.0))"
# Retrieve datasets that match our filter
products = selected_collection.search(geo=geometry)

In [11]:
try:
    print(f'Found Datasets: {products.total_results} datasets for the given area of interest.')
except eumdac.collection.CollectionError as error:
    print(f"Error related to the collection: '{error.msg}'")
except requests.exceptions.RequestException as error:
    print(f"Unexpected error: {error}")

Found Datasets: 21609 datasets for the given time range


### - Filter and Sort Datasets by Time
In order to limit our search not only spatially but also temporally to get only the products that are most relevant to us, we now add a sensing start and end time to our search parameters.

**Parameters**
- **dtstart**: sensing start date-time
- **dtend**: sensing end date-time
- **pi**: ID of collection
- **sort**: mode for ascending or descending sorting

In [8]:
# Set sensing start and end time
start = datetime.datetime(2021, 11, 10, 8, 0)
end = datetime.datetime(2021, 11, 10, 12, 0)
# Retrieve datasets that match our filter
products = selected_collection.search(
    dtstart=start, 
    dtend=end,
    sort="start,time,1")

In [9]:
try:
    print(f'Found {products.total_results} datasets for the given time range and geometry:') 
    for product in products:
        print(product)
except eumdac.collection.CollectionError as error:
    print(f"Error related to the collection: '{error.msg}'")
except requests.exceptions.RequestException as error:
    print(f"Unexpected error: {error}")

Found 7 datasets for the given time range and geometry:
IASI_SND_02_M03_20211110063559Z_20211110081455Z_N_O_20211110082404Z
IASI_SND_02_M01_20211110072355Z_20211110090555Z_N_O_20211110082048Z
IASI_SND_02_M03_20211110081455Z_20211110095655Z_N_O_20211110100156Z
IASI_SND_02_M01_20211110090555Z_20211110104459Z_N_O_20211110100530Z
IASI_SND_02_M03_20211110095655Z_20211110113855Z_N_O_20211110114221Z
IASI_SND_02_M01_20211110104459Z_20211110122659Z_N_O_20211110114332Z
IASI_SND_02_M03_20211110113855Z_20211110131759Z_N_O_20211110132204Z


The example above shows how to sort the results in an ascending (1) or descending (0 - default) direction. Ascending sorting means, the list shows products from oldest to newest.

*Note: The sorting for acending or descending is limited to the ordering of the products and is not related to the orbit path.*

#### - Filter Datasets by Publication Time
Besides searching by sensing time, it is also possible to search for products by their publication time. In other words, we specify a publication start and end time and then filter the products based on the time they were published.

**Parameters**
- **geo**: a custom geomtery in [Well Known Text format](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry), using EPSG:4326 decimal degrees
- **publication**: date-time string based on OGC standardised mathematical notation
- **pi**: ID of collection

In [10]:
# Set publication start and end time
publication_start = datetime.datetime(2021, 11, 10, 8, 0)
publication_end = datetime.datetime(2021, 11, 10, 12, 0)

# Bring publication start and end time into the correct formated string
start = publication_start.strftime("%Y-%m-%dT%H:%M:%S.000Z")
end = publication_end.strftime("%Y-%m-%dT%H:%M:%S.000Z")

In [11]:
# Retrieve datasets that match our filter
products = selected_collection.search(
    publication=f"[{start},{end}]")

try:
    print(f'Found datasets: {products.total_results} datasets for the given time range and geometry')
except eumdac.collection.CollectionError as error:
    print(f"Error related to the collection: '{error.msg}'")
except requests.exceptions.RequestException as error:
    print(f"Unexpected error: {error}")

Found datasets: 6 datasets for the given time range and geometry


The example above shows how to filter product that are published between two times. The format of the string is based on the OGC standard ([OGC OpenSearch Extension for Earth Observation](https://docs.opengeospatial.org/is/13-026r9/13-026r9.html#20)) and uses mathematical notation. It uses mathematical notation for ranges and sets to define the intervals with:

```
n1 equal to field = n1,                    {n1,n2,…} equals to field=n1 OR field=n2 OR …
[n1,n2] equal to n1 <= field <= n2,        [n1,n2[ equals to n1 <= field < n2
]n1,n2[ equals to n1 < field < n2          ]n1,n2] equal to n1 < field  <= n2.
[n1 equals to n1<= field                   ]n1 equals to n1 < field
n2] equals to field <= n2                  n2[ equals to field < n2.
```

This means that filter for products that are published after a specific time can be expressed as the following:

In [16]:
products_since_date = selected_collection.search(
    geo=geometry,
    publication=f"[{start}") # Note the square bracket in front of the start time variable

try:
    print(f'Found Datasets: {products_since_date.total_results} datasets were published after the given time.')
except eumdac.collection.CollectionError as error:
    print(f"Error related to the collection: '{error.msg}'")
except requests.exceptions.RequestException as error:
    print(f"Unexpected error: {error}")

Found Datasets: 2320 datasets were published after the given time.


## 3. List Found Datasets

Now we have identified the products that overlap with our area or time bounds of interest, we can show the properties of each product as follows.

In [17]:
for product in products:
    try:
        display(HTML('<b>'+str(product)+'</b>'))
        display(HTML('<b>Orbit type:</b> '+product.orbit_type))
        display(HTML('<b>Instrument:</b> '+product.instrument))
        display(HTML('<b>Satellite:</b> '+product.satellite))
        display(HTML('<b>Sensing start:</b> '+str(product.sensing_start)))
        display(HTML('<b>Sensing end:</b> '+str(product.sensing_end)))
        display(HTML('<b>Size:</b> '+str(product.size)))
        display(HTML('<b>Files:</b>'))
        for entry in product.entries:
            display(entry)
        print("----------------------------------------")
    except eumdac.product.ProductError as error:
        print(f"Error related to the product: '{error.msg}'")
    except requests.exceptions.RequestException as error:
        print(f"Unexpected error: {error}")

'IASI_SND_02_M01_20211110090555Z_20211110104459Z_N_O_20211110100530Z.nat'

'EOPMetadata.xml'

'manifest.xml'

----------------------------------------


'EOPMetadata.xml'

'manifest.xml'

'IASI_SND_02_M03_20211110081455Z_20211110095655Z_N_O_20211110100156Z.nat'

----------------------------------------


We have now successfully filtered out searches by both time and/or space, in the next tutorial we will download the product(s).

<a href="./index.ipynb">← Index</a>
<br>
<a href="./1_1_Discovering_collections.ipynb">← Discovering collections</a>&nbsp;&nbsp;&nbsp;<span style="float:right;"><a href="./1_3_Downloading_products.ipynb">Downloading products →</a>

<hr>

<p style="text-align:left;">This project is licensed under the <a href="./LICENSE.TXT">MIT License</a> <span style="float:right;"><a href="https://gitlab.eumetsat.int/eumetlab/data-services/eumdac_data_store">View on GitLab</a> | <a href="https://classroom.eumetsat.int/">EUMETSAT Training</a> | <a href=mailto:ops@eumetsat.int>Contact</a></span></p>