<table align='right'><tr>
<td style="padding:10px"><img src="resources/img/EC_POS.png" style="max-height:50px;width:auto;"/></td>
<td style="padding:10px"><img src="resources/img/ESA_logo_2020_Deep.png" style="max-height:40px;width:auto;"/></td>
<td style="padding:10px"><img src="resources/img/Copernicus_blue.png" style="max-height:60px;width:auto;"/></td>
<td style="padding:10px"><img src="resources/img/AIRBUS_Blue.png" style="max-height:30px;width:auto;"/></td>
<td style="padding:10px"><img src="resources/img/CS-GROUP.png" style="max-height:50px;width:auto;"/></td>
</tr></table>

<font color="#138D75">**Copernicus Reference System Python**</font> <br>
**Copyright:** Copyright 2024 ESA <br>
**License:** Apache License, Version 2.0 <br>
**Authors:** Airbus, CS Group

<div class="alert alert-block alert-success">
<h3>Copernicus Reference System Python tutorial for the ESA checkpoint 0.1</h3></div>

# Introduction

## Data used

In this notebook, we use simulated Auxip and Cadip data.

<mark>TO BE DEFINED: will we use real data from real stations for the checkpoint ?</marked>

## Learning outcomes

At the end of this notebook you will know how to:
* Use the RS-Client Python library.
* Use it to:
    * Search individual Auxip files and Cadip chunk files from the reception stations.
    * Stage these files into the RS-Server STAC catalog.
    * Use most of the STAC catalog functionalities.
* Call Prefect flows to run parallel tasks to:
    * Stage multiple Auxip and Cadip files at once.
    * Run a simulated DPR processing on the staged files, and save results in the catalog.

## Outline

<div class="alert alert-info" role="alert">

## <a id='TOC_TOP'></a>Contents

</div>
    
 1. [Check your installation](#Check-your-installation) 
 1. [RsClient initialisation](#RsClient-initialisation)
 1. [Call services manually](#Call-services-manually)
     1. [Search Auxip and Cadip stations](#Search-Auxip-and-Cadip-stations)
     1. [Stage Auxip and Cadip files](#Stage-Auxip-and-Cadip-files)
     1. [Use the STAC catalog](#Use-the-STAC-catalog)
     1. [Search Cadip sessions](#Search-Cadip-sessions)
 1. [Prefect workflows](#Prefect-workflows)
     1. [Workflow: initialisation](#Workflow:-initialisation)
     1. [Workflow: stage Cadip files](#Workflow:-stage-Cadip-files)
     1. [Workflow: stage Auxip files](#Workflow:-stage-Auxip-files)
     1. [Workflow: DPR simulator](#Workflow:-DPR-simulator)

<hr>

<div class="alert alert-info" role="alert">

## Check your installation

In this section, we will check that your Jupyter Notebook environment is correctly set.

[Back to top](#Contents)

</div>

### `rs-client-libraries` installation

The `rs-client-libraries` Python library is the preferred way to access the RS-Server services from your environment. It is automatically installed in this notebook.

**Note**: don't worry about these OpenTelemetry messages for now, they will be fixed in a later version:
```
Overriding of current TracerProvider is not allowed
Attempting to instrument while already instrumented
Transient error StatusCode.UNAVAILABLE encountered while exporting metrics to ..., retrying in ...s
Failed to export metrics to ..., error code: StatusCode.UNIMPLEMENTED
```

In [None]:
import rs_client
import rs_common
import rs_workflows

# Set logger level to info
import logging
rs_common.logging.Logging.level = logging.INFO

### Environment

In [None]:
import os

# In local mode, all your services are running locally.
# In hybrid or cluster mode, we use the services deployed on the RS-Server website.
# This configuration is set in an environment variable.
local_mode = (os.getenv("RSPY_LOCAL_MODE") == "1")

# In local mode, the service URLs are hardcoded in the docker-compose file
if local_mode:
    rs_server_href = None # not used
    RSPY_PREFECT_URL = "http://localhost:4200"
    RSPY_DPR_SIMU_URL = "http://dpr-simulator:8000"
    print (f"Auxip service: http://localhost:8001/docs")
    print (f"CADIP service: http://localhost:8002/docs")
    print (f"Catalog service: http://localhost:8003/api.html")
    print (f"MinIO dashboard (object storage): http://localhost:9101 with user=minio password=Strong#Pass#1234")
    print (f"Prefect dashboard (orchestrator): {RSPY_PREFECT_URL}")
    print (f"Grafana dashboard (logs, traces, metrics): http://localhost:3000/explore")

# In hybrid or cluster mode, they are set in an environment variables
else:
    rs_server_href = os.environ["RSPY_WEBSITE"]
    RSPY_PREFECT_URL = os.environ['RSPY_PREFECT_URL']
    RSPY_DPR_SIMU_URL = os.environ["RSPY_DPR_SIMU_URL"]
    print (f"RS-Server website: {rs_server_href}")
    print (f"Create an API key: {rs_server_href}/docs#/API-Key%20Manager/create_api_key_apikeymanager_auth_api_key_new_get")
    print (f"Prefect dashboard (orchestrator): {RSPY_PREFECT_URL}")
    print (f"Grafana dashboard (logs, traces, metrics): {os.environ['RSPY_GRAFANA_URL']}")

### API key

In hybrid and cluster mode, you need an API key to access the RS-Server services. 

You must create one from the link displayed from the previous cell, then enter it manually in the cell below. 

If you prefer to load it automatically in all your notebooks, you can: 

  * From your JupyterHub workspace, open the text file `~/.rspy` <mark>(TODO: name to be defined)</mark>
  * Save your API key using this syntax:

    ```bash
    export RSPY_APIKEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx # replace by your value
    ```

  * Save and close the file.
  * <mark>TO BE CONFIRMED: Reload your JupyterHub session from Menu -> File -> Log Out / or just</mark>
  * <mark>Reload this notebook kernel from Menu -> Kernel -> Restart Kernel.</mark>

In [None]:
apikey = os.getenv("RSPY_APIKEY")
if (not local_mode) and (not apikey):
    import getpass
    apikey = getpass.getpass(f"Enter your API key:")
    os.environ["RSPY_APIKEY"] = apikey

### S3 buckets (object storage)

The temporary S3 bucket is used to download the Auxip and Cadip files. 

When we publish these files into the STAC catalog, they are moved from the temporary into the final S3 bucket.

In [None]:
# We use these bucket names that are deployed on the cluster. 
# RS-Server has read/write access to these buckets, but as an end-user, you won't manipulate them directly.
TEMP_S3_BUCKET = "rs-cluster-temp"
FINAL_S3_BUCKET = "rs-cluster-catalog"

# Except in local mode, where we use a local MinIO object storage instance.
# We need to manually create the buckets.
if local_mode:
    !pip install boto3
    from resources.utils import create_s3_buckets
    create_s3_buckets(TEMP_S3_BUCKET, FINAL_S3_BUCKET)

<div class="alert alert-info" role="alert">

## RsClient initialisation

Initialise Python RsClient class instances to access the RS-Server services.

[Back to top](#Contents)

</div>

In [None]:
from rs_client.rs_client import RsClient
from rs_common.config import ECadipStation

# Init a generic RS-Client instance. Pass the:
#   - RS-Server website URL
#   - API key
#   - ID of the owner of the STAC catalog collections.
#     By default, this is the user login from the keycloak account, associated with the API key.
#     Or, in local mode, this is the local system username.
#     Else, your API Key must give you the rights to read/write on this catalog owner (see next cell).
#   - Logger (optional, a default one can be used)
generic_client = RsClient(rs_server_href, apikey, owner_id=None, logger=None)
print(f"Owner ID: {generic_client.owner_id!r}")

# From this generic instance, get an Auxip client instance
auxip_client = generic_client.get_auxip_client()

# Or get a Cadip client instance. Pass the cadip station.
cadip_station = ECadipStation.CADIP
cadip_client = generic_client.get_cadip_client(cadip_station)

# Or get a Stac client to access the catalog
stac_client = generic_client.get_stac_client()

In [None]:
# In hybrid or cluster mode, show the user login and IAM roles from the keycloak account, 
# associated with the api key
if not local_mode:
    user_login = generic_client.apikey_user_login
    iam_roles = "\n".join(generic_client.apikey_iam_roles)
    print(f"API key user login: {user_login!r} \nIAM roles: \n{iam_roles}")

<div class="alert alert-info" role="alert">

## Call services manually

In this section, we will see how to call manually these services: 

  * Search Auxip and Cadip reception stations for new files
  * Stage these files and check the staging status
  * STAC catalog services
  * Search Cadip sessions

[Back to top](#Contents)

</div>

In [None]:
# Do some initialisation
from datetime import datetime
import json
from time import sleep

from rs_common.config import EDownloadStatus, EPlatform

# Define a search interval
start_date = datetime(2010, 1, 1, 12, 0, 0)
stop_date = datetime(2024, 1, 1, 12, 0, 0)

# Timeout in seconds for the endpoints
TIMEOUT = 30

<div class="alert alert-info" role="alert">

### Search Auxip and Cadip stations

[Back to top](#Contents)

</div>

In [None]:
# Do this using the Auxip client (to find Auxip files) 
# then the Cadip client (to find Cadip chunk files)
for client in [auxip_client, cadip_client]:

    # Call the service to search the reception stations for new files in the date interval.
    files = client.search_stations(start_date, stop_date, TIMEOUT)
    
    file_count = len(files)
    assert file_count, f"We should have at least one {client.station_name} file"
    print (f"Found {file_count} {client.station_name} files\n")

    # Print the first file metadata. It is in the STAC format.
    print(f"First {client.station_name} file:\n{json.dumps(files[0], indent=2)}\n")

    # By default, the files are returned sorted by the most recent first (by creation date)
    ids="\n".join([f"{f['properties']['created']} - {f['id']}" for f in files[:10]])
    print(f"Most recent {client.station_name} IDs and datetimes:\n{ids}\n")

In [None]:
# We can sort by +/- any property, e.g. by creation date ascending = the oldest first
for client in [auxip_client, cadip_client]:    
    files = client.search_stations(start_date, stop_date, TIMEOUT, sortby="+created")
    ids="\n".join([f"{f['properties']['created']} - {f['id']}" for f in files[:10]])
    print(f"Oldest {client.station_name} IDs and datetimes:\n{ids}\n")

<div class="alert alert-info" role="alert">

### Stage Auxip and Cadip files

When RS-Server stages a file, it means to:
1. Copy (=download) it from the reception station into the temporary S3 bucket.
1. Publish its metadata into the STAC catalog and move it from the temporary into the final S3 bucket.

[Back to top](#Contents)

</div>

In [None]:
temp_s3_files = []
for client in [auxip_client, cadip_client]:

    # When searching stations, we can also limit the number of returned results.
    # For this example, let's keep only one file.
    files = client.search_stations(start_date, stop_date, TIMEOUT, limit=1)
    assert len(files) == 1

    # We stage by filename = the file ID
    first_filename = files[0]["id"]

    # We must give a temporary S3 bucket path where to copy the file from the station.
    # Use our API key username so avoid conflicts with other users.
    s3_path = f"s3://{TEMP_S3_BUCKET}/{client.apikey_user_login}/{client.station_name}"
    temp_s3_files.append (f"{s3_path}/{first_filename}") # save it for later

    # We can also download the file locally to the server, but this is useful only in local mode
    local_path = None

    # Call the staging service
    client.staging(first_filename, TIMEOUT, s3_path=s3_path, tmp_download_path=local_path)

    # Then we can check when the staging has finished by calling the check status service
    while True:
        status = client.staging_status(first_filename, TIMEOUT)
        print (f"Staging status for {first_filename!r}: {status.value}")
        if status in [EDownloadStatus.DONE, EDownloadStatus.FAILED]:
            print("\n")
            break
        sleep(1)

    # WARNING: the file is copied into the temporary S3 bucket but is not yet published into the catalog

<div class="alert alert-info" role="alert">

### Use the STAC catalog

The SpatioTemporal Asset Catalog (STAC) family of specifications aim to standardize the way geospatial asset metadata is structured and queried. 

A 'spatiotemporal asset' is any file that represents information about the Earth captured in a certain space and time. 

For more information, see: https://github.com/radiantearth/stac-api-spec/tree/main

In this section, we will see how to use most of the RS-Server STAC catalog functionalities.

<mark>TODO: this part must be completed and replaced by the StacClient</mark>

[Back to top](#Contents)

</div>

#### Create a new collection

In [None]:
import requests

COLLECTION = "my_collection"

collection = {
            "id": COLLECTION,
            "type": "Collection",
            "description": "This is my collection description",
            "stac_version": "1.0.0",
            "owner": stac_client.owner_id
        }

# Clean the existing collection, if any
requests.delete(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{COLLECTION}", **stac_client.apikey_headers, stream=True)

# Create it from new
response = requests.post(f"{stac_client.href_catalog}/catalog/collections", json=collection, **stac_client.apikey_headers)
response.raise_for_status()

# See my personal catalog information
response = requests.get(f"{stac_client.href_catalog}/catalog/catalogs/{stac_client.owner_id}", **stac_client.apikey_headers)
response.raise_for_status()
print (f"My catalog information:\n{json.dumps (json.loads (response.content), indent=2)}")

# See my collection information
response = requests.get(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{COLLECTION}", **stac_client.apikey_headers)
response.raise_for_status()
print (f"\nMy collection information:\n{json.dumps (json.loads (response.content), indent=2)}")

#### Add new items to the collection

In [None]:
# We will add one Auxip and one Cadip file that were staged from the previous steps
for temp_s3_file in temp_s3_files:

    # Let's use STAC item ID = filename
    print(f"\nAdd catalog item from: {temp_s3_file!r}")
    item_id = os.path.basename(temp_s3_file)
    
    item = {
                "id": item_id,
                "bbox": [-94.6334839, 37.0332547, -94.6005249, 37.0595608],
                "type": "Feature",
                "geometry": {
                    "type": "Polygon",
                    "coordinates": [
                        [
                            [-94.6334839, 37.0595608],
                            [-94.6334839, 37.0332547],
                            [-94.6005249, 37.0332547],
                            [-94.6005249, 37.0595608],
                            [-94.6334839, 37.0595608],
                        ]
                    ],
                },
                "collection": COLLECTION,
                "properties": {
                    "gsd": 0.5971642834779395,
                    "width": 2500,
                    "height": 2500,
                    "datetime": "2000-02-02T00:00:00Z",
                    "proj:epsg": 3857,
                    "orientation": "nadir",
                    "owner_id": stac_client.owner_id,
                },
                "stac_extensions": [],
                "assets": {
                    "file": {
                        "href": temp_s3_file,
                        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
                        "title": "NOAA STORM COG",
                    },
                },
            }

    # WARNING: after this, the staged file is moved from the temporary into the final bucket,
    # so this cell can be run only once, or you'll need to stage the files again from the previous section.
    
    response = requests.post(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{COLLECTION}/items", json=item, **stac_client.apikey_headers)
    response.raise_for_status()
    
    response = requests.get(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{COLLECTION}/items/{item_id}", **stac_client.apikey_headers)
    response.raise_for_status()
    inserted_item = json.loads (response.content)
    print (f"Saved item in the catalog:\n{json.dumps (inserted_item, indent=2)}")

<mark>TODO: demonstrate the other catalog endpoints. Demonstrate the create_cql2_filter ? (used below)</mark>

<mark>print the links to download the files (requests.get(stac_client.href_catalog + f"/catalog/collections/{owner_id}:{mission}_aux/items/{feature['id']}/download/file")</mark>

<div class="alert alert-info" role="alert">

### Search Cadip sessions

All Cadip chunk files are attached to a single Cadip session.

We can search Cadip sessions by parameters, or find information about a specific session ID.

[Back to top](#Contents)

</div>

In [None]:
# Search cadip sessions by date interval and platforms
platforms = [EPlatform.S1A, EPlatform.S2B]
sessions = cadip_client.search_sessions(TIMEOUT, start_date=start_date, stop_date=stop_date, platforms=platforms)

session_count = len(sessions)
assert session_count, "We should have at least one Cadip session"
print (f"Found {session_count} Cadip sessions")

In [None]:
# Print the first cadip session metadata. It is in the STAC format.
print(f"First Cadip session:\n{json.dumps(sessions[0], indent=2)}\n")

In [None]:
# Print all the Cadip session IDs
ids=[s["id"] for s in sessions]
print_ids="\n".join(ids)
print(f"Cadip sessions IDs:\n{print_ids}")

# Save the first session ID for later
first_cadip_session = ids[0]

In [None]:
# We can also search Cadip sessions by specific session IDs,
# e.g. get information for the cadip sessions #2 and #3
search_ids=ids[1:3]
search_sessions = cadip_client.search_sessions(TIMEOUT, session_ids=search_ids)
print(f"Cadip sessions information:\n{json.dumps(search_sessions, indent=2)}\n")

<div class="alert alert-danger" role="alert">

### Exercises: call services manually

<mark>TODO: stage and push other files to the catalog, using another owner on which you have permissions. Print other session info.</mark>

NOTE: also use the endpoints from the SWAGGER

[Back to top](#Contents)

</div>

<div class="alert alert-info" role="alert">

## Prefect workflows
[Back to top](#Contents)
</div>

<div class="alert alert-info" role="alert">

### Workflow: initialisation

[Back to top](#Contents)

</div>

In [None]:
from datetime import timedelta

# The workflow will stage all files between a date interval.
# In this example, we will stage all the files of the first Cadip session.
# We extract the mission and date from the session ID: <mission>_YYYYmmdd<other_info>
session_id = first_cadip_session
print(f"First Cadip session ID: {session_id!r}")
mission, date_and_other = session_id.split("_")
date = datetime.strptime (date_and_other[:8], "%Y%m%d")
start_datetime = date # start from midnight
stop_datetime = date + timedelta(days=1) # midnight the day after

print(f"Mission: {mission!r}")
print(f"Date interval: '{start_datetime} -> {stop_datetime}'")

In [None]:
# As a prerequisite, we must create manually the Auxip and Cadip 
# collections in the catalog, if they don't already exist.
from rs_workflows import staging

for client in [auxip_client, cadip_client]:
    collection_name = staging.create_collection_name(mission, client.station_name)

    # Save the collection name for later
    if client == auxip_client:
        auxip_collection = collection_name
    else:
        cadip_collection = collection_name

    # Try to get collection information    
    response = requests.get(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{collection_name}", **stac_client.apikey_headers)

    # If it's a 404, this means that the collection doesn't exist, so create it
    if response.status_code == 404:
        
        collection = {
            "id": collection_name,
            "type": "Collection",
            "description": "This is my collection description",
            "stac_version": "1.0.0",
            "owner": stac_client.owner_id
        }
        print(f"Create collection: {collection_name!r}")
        response = requests.post(f"{stac_client.href_catalog}/catalog/collections", json=collection, **stac_client.apikey_headers)
        response.raise_for_status()
        
    else:
        print(f"Collection already exists: {collection_name!r}")

<div class="alert alert-info" role="alert">

### Workflow: stage Cadip files

[Back to top](#Contents)

</div>

In [None]:
# Note: for now, in hybrid mode, it's the localhost URL that is used
print(f"\nView Prefect flow runs from: {RSPY_PREFECT_URL}\n")

In [None]:
# Number of tasks to be run in parallel
MAX_WORKERS = 15

# Staging workflow configuration
config = staging.PrefectFlowConfig(
    cadip_client, 
    mission, 
    s3_path = f"s3://{TEMP_S3_BUCKET}/{cadip_client.owner_id}/{cadip_client.station_name}",
    tmp_download_path=None, # no local download
    max_workers=MAX_WORKERS,
    start_datetime=start_datetime,
    stop_datetime=stop_datetime,
    limit=None) # no limit on the number of files

# Start the prefect flow
staging.staging_flow(config)

In [None]:
from rs_workflows import s1_l0

# We can use this filter to find the staged Cadip files for this session ID
query = s1_l0.create_cql2_filter({"collection": f"{cadip_client.owner_id}_{cadip_collection}", "cadip:session_id": session_id})
response = requests.post(f"{stac_client.href_catalog}/catalog/search", json=query, **stac_client.apikey_headers)

response.raise_for_status()
files = json.loads (response.content)
print (f"\n{files['context']['returned']} Cadip files are staged for session ID: {session_id!r}.")
print (f"First one:\n{json.dumps (files['features'][0], indent=2)}")

<div class="alert alert-info" role="alert">

### Workflow: stage Auxip files

We need to pass Auxip files to the DPR processing. They must be staged into the catalog.

As for now, the DPR processing is only a simulation, we can pass any Auxip files.

[Back to top](#Contents)

</div>

In [None]:
# Let's use the 3 most recent files between today and any old date
files = auxip_client.search_stations(
    datetime(year=1970, month=1, day=1), 
    datetime.today(), 
    TIMEOUT, 
    sortby="-created",
    limit=None) # NOTE: for now "limit" is not working well with "sortby" so don't use it

# Only keep the first 3 files
files = files[:3]

# Save the IDs = the filenames
auxip_files = [f["id"] for f in files]
print_ids = "\n".join(auxip_files)
print(f"Auxip files: \n{print_ids}")

# Save the min and max dates for these 3 files
dates = [datetime.strptime (f["properties"]["created"], "%Y-%m-%dT%H:%M:%S.%fZ") for f in files]
start_datetime = min(dates) - timedelta(seconds=1) # remove 1 second because the interval is exclusive
stop_datetime = max(dates) + timedelta(seconds=1) # add 1 second
print(f"\nDate interval: '{start_datetime} -> {stop_datetime}'")

In [None]:
# Staging workflow configuration
config = staging.PrefectFlowConfig(
    auxip_client, 
    mission, 
    s3_path = f"s3://{TEMP_S3_BUCKET}/{auxip_client.owner_id}/{auxip_client.station_name}",
    tmp_download_path=None, # no local download
    max_workers=MAX_WORKERS,
    start_datetime=start_datetime,
    stop_datetime=stop_datetime,
    limit=None) # no limit on the number of files

# Start the prefect flow
staging.staging_flow(config)

In [None]:
# We should have 3 items in the Auxip collection
response = requests.get(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{auxip_collection}/items/", **stac_client.apikey_headers)
response.raise_for_status()

content = json.loads(response.content)
staged_auxip = [feature["id"] for feature in content["features"]]
print_ids = "\n".join(staged_auxip)
print (f"{content['context']['returned']} Auxip files are staged:\n{print_ids}")

# Check that the expected Auxip files were staged
for file in auxip_files:
    assert file in staged_auxip

<div class="alert alert-info" role="alert">

### Workflow: DPR simulator

[Back to top](#Contents)

</div>

In [None]:
from importlib import reload
reload(s1_l0)
session_id="S1A_20200105072204051312"

In [None]:
from rs_workflows import s1_l0

# The product types to process can be any of these 4 values
product_types = ["S1SEWRAW", "S1SIWRAW", "S1SSMRAW", "S1SWVRAW"]

# DPR workflow configuration
config = s1_l0.PrefectS1L0FlowConfig(
    stac_client,
    RSPY_DPR_SIMU_URL,
    mission,
    session_id,
    product_types,
    auxip_files,
    s3_path = f"s3://{FINAL_S3_BUCKET}/{client.owner_id}/DPR_S1L0",
    temp_s3_path = f"s3://{TEMP_S3_BUCKET}/{client.owner_id}/DPR_S1L0",
)

# Start the prefect flow
s1_l0.s1_l0_flow(config)

In [None]:
# The DPR collection name is hardcoded in the workflow
dpr_collection = f"{mission}_dpr"

# We should have one item in the DPR collection for each product type
response = requests.get(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{dpr_collection}/items/", **stac_client.apikey_headers)
response.raise_for_status()

content = json.loads(response.content)
features = content["features"]
ids = "\n".join([feature["id"] for feature in features])

print (f"{len(features)} DPR items in the catalog:\n{ids}")
print (f"\nFirst item:\n{json.dumps(features[0], indent=2)}")

<mark>TODO: show DPR items from the catalog</mark>

<hr>
<a href="https://github.com/RS-PYTHON" target="_blank">View on GitLab</a>