<table align='right'><tr>
<td style="padding:10px"><img src="resources/img/EC_POS.png" style="max-height:50px;width:auto;"/></td>
<td style="padding:10px"><img src="resources/img/ESA_logo_2020_Deep.png" style="max-height:40px;width:auto;"/></td>
<td style="padding:10px"><img src="resources/img/Copernicus_blue.png" style="max-height:60px;width:auto;"/></td>
<td style="padding:10px"><img src="resources/img/AIRBUS_Blue.png" style="max-height:30px;width:auto;"/></td>
<td style="padding:10px"><img src="resources/img/CS-GROUP.png" style="max-height:50px;width:auto;"/></td>
</tr></table>

<font color="#138D75">**Copernicus Reference System Python**</font> <br>
**Copyright:** Copyright 2024 ESA <br>
**License:** Apache License, Version 2.0 <br>
**Authors:** Airbus, CS Group

<div class="alert alert-block alert-success">
<h3>Copernicus Reference System Python tutorial for the ESA checkpoint 0.1</h3></div>

# Introduction

## Links

* GitHub: https://github.com/RS-PYTHON
* Documentation: https://rs-python.github.io/rs-documentation/

## Data used

In this notebook, we use simulated Auxip and Cadip data.

<mark>TO BE DEFINED: will we use real data from real stations for the checkpoint ?</marked>

## Learning outcomes

At the end of this notebook you will know how to:
* Use the RS-Client Python library.
* Use it to:
    * Search individual Auxip files and Cadip chunk files from the Auxip and Cadip simulators.
    * Stage these files into the RS-Server STAC catalog.
    * Use most of the STAC catalog functionalities.
* Call Prefect flows to run parallel tasks to:
    * Stage multiple Auxip and Cadip files at once.
    * Run a simulated DPR processing on the staged files, and save results in the catalog.

## Outline

<div class="alert alert-info" role="alert">

## Contents

</div>
    
1. [Check your installation](#Check-your-installation) 
1. [RsClient initialisation](#RsClient-initialisation)
1. [Call services manually](#Call-services-manually)
    1. [Search Auxip and Cadip stations](#Search-Auxip-and-Cadip-stations)
    1. [Stage Auxip and Cadip files](#Stage-Auxip-and-Cadip-files)
    1. [Use the STAC catalog](#Use-the-STAC-catalog)
    1. [Search Cadip sessions](#Search-Cadip-sessions)
    1. [Exercises](#Exercises:-call-services-manually)
 1. [Prefect workflows](#Prefect-workflows)
     1. [Initialisation](#Workflow:-initialisation)
     1. [Stage Cadip chunk files](#Workflow:-stage-Cadip-chunk-files)
     1. [Stage Auxip files](#Workflow:-stage-Auxip-files)
     1. [DPR simulator](#Workflow:-DPR-simulator)
     1. [Exercises](#Exercises:-workflows)

<hr>

<div class="alert alert-info" role="alert">

## Check your installation

In this section, we will check that your Jupyter Notebook environment is correctly set.

[Back to top](#Contents)

</div>

### `rs-client-libraries` installation

The `rs-client-libraries` Python library is the preferred way to access the RS-Server services from your environment. It is automatically installed in this notebook.

**Note**: don't worry about these OpenTelemetry messages for now, they will be fixed in a later version:
```
Overriding of current TracerProvider is not allowed
Attempting to instrument while already instrumented
Transient error StatusCode.UNAVAILABLE encountered while exporting metrics to ..., retrying in ...s
Failed to export metrics to ..., error code: StatusCode.UNIMPLEMENTED
```

In [None]:
import rs_client
import rs_common
import rs_workflows

# Set logger level to info
import logging
rs_common.logging.Logging.level = logging.INFO

### Environment

In [None]:
import os

# In local mode, all your services are running locally.
# In hybrid or cluster mode, we use the services deployed on the RS-Server website.
# This configuration is set in an environment variable.
local_mode = (os.getenv("RSPY_LOCAL_MODE") == "1")

# In local mode, the service URLs are hardcoded in the docker-compose file
if local_mode:
    rs_server_href = None # not used
    RSPY_HOST_AUXIP = "http://localhost:8001/docs"
    RSPY_HOST_CADIP = "http://localhost:8002/docs"
    RSPY_HOST_CATALOG = "http://localhost:8003/api.html"
    RSPY_PREFECT_URL = "http://localhost:4200"
    RSPY_DPR_SIMU_URL = "http://dpr-simulator:8000"
    print (f"Auxip service: {RSPY_HOST_AUXIP}")
    print (f"CADIP service: {RSPY_HOST_CADIP}")
    print (f"Catalog service: {RSPY_HOST_CATALOG}")
    print (f"MinIO dashboard (object storage): http://localhost:9101 with user=minio password=Strong#Pass#1234")
    print (f"Prefect dashboard (orchestrator): {RSPY_PREFECT_URL}")
    print (f"Grafana dashboard (logs, traces, metrics): http://localhost:3000/explore")

# In hybrid or cluster mode, they are set in an environment variables
else:
    rs_server_href = os.environ["RSPY_WEBSITE"]
    RSPY_PREFECT_URL = os.environ['RSPY_PREFECT_URL']
    RSPY_DPR_SIMU_URL = os.environ["RSPY_DPR_SIMU_URL"]
    print (f"RS-Server website: {rs_server_href}")
    print (f"Create an API key: {rs_server_href}/docs#/API-Key%20Manager/create_api_key_apikeymanager_auth_api_key_new_get")
    print (f"Prefect dashboard (orchestrator): {RSPY_PREFECT_URL}")
    print (f"Grafana dashboard (logs, traces, metrics): {os.environ['RSPY_GRAFANA_URL']}")

### API key

In hybrid and cluster mode, you need an API key to access the RS-Server services. 

You must create one from the link displayed from the previous cell, see: <https://rs-python.github.io/rs-documentation/rs-server/docs/doc/users/oauth2_apikey_manager>

Then enter it manually in the cell below. 

It is easier to save it into your `~/.env` file so it is loaded automatically by your notebooks. To do so, run this line from your JupyterHub terminal, then restart this notebook kernel: 

```shell
# Replace by your value
echo "export RSPY_APIKEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" >> ~/.env
```

In [None]:
apikey = os.getenv("RSPY_APIKEY")
if (not local_mode) and (not apikey):
    import getpass
    apikey = getpass.getpass(f"Enter your API key:")
    os.environ["RSPY_APIKEY"] = apikey

### S3 buckets (object storage)

The temporary S3 bucket is used to download the Auxip and Cadip files. 

When we publish these files into the STAC catalog, they are moved from the temporary into the final S3 bucket.

In [None]:
# We use these bucket names that are deployed on the cluster. 
# RS-Server has read/write access to these buckets, but as an end-user, you won't manipulate them directly.
TEMP_S3_BUCKET = "rs-cluster-temp"
FINAL_S3_BUCKET = "rs-cluster-catalog"

# Except in local mode, where we use a local MinIO object storage instance.
# We need to manually create the buckets.
if local_mode:
    !pip install boto3
    from resources.utils import create_s3_buckets
    create_s3_buckets(TEMP_S3_BUCKET, FINAL_S3_BUCKET)

<div class="alert alert-info" role="alert">

## RsClient initialisation

Initialise Python RsClient class instances to access the RS-Server services.

[Back to top](#Contents)

</div>

In [None]:
import json
from rs_client.rs_client import RsClient
from rs_common.config import ECadipStation

# Init a generic RS-Client instance. Pass the:
#   - RS-Server website URL
#   - API key
#   - ID of the owner of the STAC catalog collections.
#     By default, this is the user login from the keycloak account, associated to the API key.
#     Or, in local mode, this is the local system username.
#     Else, your API Key must give you the rights to read/write on this catalog owner (see next cell).
#   - Logger (optional, a default one can be used)
generic_client = RsClient(rs_server_href, apikey, owner_id=None, logger=None)
print(f"STAC catalog owner: {generic_client.owner_id!r}")

# From this generic instance, get an Auxip client instance
auxip_client = generic_client.get_auxip_client()

# Or get a Cadip client instance. Pass the cadip station.
cadip_station = ECadipStation.CADIP # you can also have: INS, MPS, MTI, NSG, SGS
cadip_client = generic_client.get_cadip_client(cadip_station)

# Or get a Stac client to access the catalog
stac_client = generic_client.get_stac_client()

print("\nValidate that our catalog is valid to the STAC format...")
stac_client.validate_all()

print("\nDisplay the Stac catalog as a treeview in notebook:")
display(stac_client)

print("\nOr just display all its contents at once:")
print(json.dumps(stac_client.to_dict(), indent=2))

In [None]:
# In hybrid or cluster mode, show information from the keycloak account, associated to the api key
if not local_mode:

    # = keycloak account user login
    print(f"API key user login: {generic_client.apikey_user_login!r}")

    # Print the IAM (Identity and Access Management) roles
    # For this tutorial, you must have: 
    #   - read/download access for Adgs (=Auxip) = "rs_adgs_<read|download>"
    #   - read/download access to the Cadip station you passed on the above cell = "rs_cadip_<station>_<read|download>"
    #   - (optional) read/write/download access to STAC catalog collections from other owners = "rs_catalog_<owner_id>:<collection|*>_<read|write|download>"
    #     (you always have all access to your own collections with owner_id=apikey_user_login as printed above)
    iam_roles = "\n".join (sorted (generic_client.apikey_iam_roles))
    print(f"\nAPI key IAM roles: \n{iam_roles}")

<div class="alert alert-info" role="alert">

## Call services manually

In this section, we will see how to call manually these services: 

  * Search Auxip and Cadip reception stations for new files
  * Stage these files and check the staging status
  * STAC catalog services
  * Search Cadip sessions

[Back to top](#Contents)

</div>

In [None]:
# Do some initialisation
from datetime import datetime
import json
from time import sleep

from rs_common.config import EDownloadStatus, EPlatform

# Define a search interval
start_date = datetime(2010, 1, 1, 12, 0, 0)
stop_date = datetime(2024, 1, 1, 12, 0, 0)

<div class="alert alert-info" role="alert">

### Search Auxip and Cadip stations

[Back to top](#Contents)

</div>

In [None]:
# Do this using the Auxip client (to find Auxip files) 
# then the Cadip client (to find Cadip chunk files)
for client in [auxip_client, cadip_client]:

    # Call the service to search the reception stations for new files in the date interval.
    files = client.search_stations(start_date, stop_date)
    
    file_count = len(files)
    assert file_count, f"We should have at least one {client.station_name} file"
    print (f"Found {file_count} {client.station_name} files\n")

    # Print the first file metadata. It is in the STAC format.
    print(f"First {client.station_name} file:\n{json.dumps(files[0], indent=2)}\n")

    # By default, the files are returned sorted by the most recent first (by creation date)
    ids="\n".join([f"{f['properties']['created']} - {f['id']}" for f in files[:10]])
    print(f"Most recent {client.station_name} IDs and datetimes:\n{ids}\n")

In [None]:
# We can sort by +/- any property, e.g. by creation date ascending = the oldest first
for client in [auxip_client, cadip_client]:    
    files = client.search_stations(start_date, stop_date, sortby="+created")
    ids="\n".join([f"{f['properties']['created']} - {f['id']}" for f in files[:10]])
    print(f"Oldest {client.station_name} IDs and datetimes:\n{ids}\n")

<div class="alert alert-info" role="alert">

### Stage Auxip and Cadip files

When RS-Server stages a file, it means to:
1. Copy (=download) it from the reception station into the temporary S3 bucket.
1. Publish its metadata into the STAC catalog and move it from the temporary into the final S3 bucket.

[Back to top](#Contents)

</div>

In [None]:
temp_s3_files = []
for client in [auxip_client, cadip_client]:

    # When searching stations, we can also limit the number of returned results.
    # For this example, let's keep only one file.
    files = client.search_stations(start_date, stop_date, limit=1)
    assert len(files) == 1

    # We stage by filename = the file ID
    first_filename = files[0]["id"]

    # We must give a temporary S3 bucket path where to copy the file from the station.
    # Use our API key username so avoid conflicts with other users.
    s3_path = f"s3://{TEMP_S3_BUCKET}/{client.apikey_user_login}/{client.station_name}"
    temp_s3_files.append (f"{s3_path}/{first_filename}") # save it for later

    # We can also download the file locally to the server, but this is useful only in local mode
    local_path = None

    # Call the staging service
    client.staging(first_filename, s3_path=s3_path, tmp_download_path=local_path)

    # Then we can check when the staging has finished by calling the check status service
    while True:
        status = client.staging_status(first_filename)
        print (f"Staging status for {first_filename!r}: {status.value}")
        if status in [EDownloadStatus.DONE, EDownloadStatus.FAILED]:
            print("\n")
            break
        sleep(1)        
    assert status == EDownloadStatus.DONE, "Staging has failed"

    # WARNING: the file is copied into the temporary S3 bucket but is not yet published into the catalog

<div class="alert alert-info" role="alert">

### Use the STAC catalog

The SpatioTemporal Asset Catalog (STAC) family of specifications aim to standardize the way geospatial asset metadata is structured and queried. 

A 'spatiotemporal asset' is any file that represents information about the Earth captured in a certain space and time. 

For more information, see: https://github.com/radiantearth/stac-api-spec/tree/main

In this section, we will see how to use most of the RS-Server STAC catalog functionalities.

[Back to top](#Contents)

</div>

#### Add a new collection to the catalog

In [None]:
from pystac import Collection, Extent, SpatialExtent, TemporalExtent

COLLECTION = "my_collection"

# Clean the existing collection, if any
stac_client.remove_collection(COLLECTION)

# In this tutorial, after each operation, we will validate that 
# our catalog is valid to the STAC format, but this is optional.
stac_client.validate_all()

# Add new collection 
response = stac_client.add_collection(
    Collection(
        id=COLLECTION,
        description=None, # rs-client will provide a default description for us
        extent=Extent(
            spatial=SpatialExtent(bboxes=[-180.0, -90.0, 180.0, 90.0]),
            temporal=TemporalExtent([start_date, stop_date])
        )
    ))
response.raise_for_status()
stac_client.validate_all()

#### Read collections from the catalog

In [None]:
# See all my personal catalog collections
for collection in stac_client.get_collections():
    print(f"I have collection: {collection} at {collection.self_href}")

# Get a specific collection information
my_collection = stac_client.get_collection(collection_id=COLLECTION)
print(f"\nCollection information from {my_collection.self_href}\n{json.dumps(collection.to_dict(), indent=2)}")

#### Add new items to the collection

In [None]:
from pystac.asset import Asset
from pystac.item import Item

# Simulated values
WIDTH=2500
HEIGHT=2500

# We will add one Auxip and one Cadip file that were staged from the previous steps
for temp_s3_file in temp_s3_files:

    # Let's use STAC item ID = filename
    print(f"Add catalog item from: {temp_s3_file!r}")
    item_id = os.path.basename(temp_s3_file)

    # The file path from the temp s3 bucket is given in the assets
    assets = {"file": Asset(href=temp_s3_file)}

    # Other hardcoded parameters for this demo
    geometry = {
        "type": "Polygon",
        "coordinates": [[[-180, -90], [180, -90], [180, 90], [-180, 90], [-180, -90]]],
    }
    bbox = [-180.0, -90.0, 180.0, 90.0]
    now = datetime.now()
    properties = {
        "gsd": 0.12345,
        "width": WIDTH,
        "height": HEIGHT,
        "datetime": datetime.now(),
        "proj:epsg": 3857,
        "orientation": "nadir",
        "owner_id": stac_client.owner_id,
    }

    # WARNING: after this, the staged file is moved from the temporary into the final bucket,
    # so this cell can be run only once, or you'll need to stage the files again from the previous section.

    # Add item to the STAC catalog collection, check status is OK
    item = Item(
        id=item_id,
        geometry=geometry,
        bbox=bbox,
        datetime=now,
        properties=properties,
        assets=assets)
    response = stac_client.add_item(COLLECTION, item)
    response.raise_for_status()
    stac_client.validate_all()

#### Read items from the collection

In [None]:
# Get the items from the catalog to check that they were inserted
for temp_s3_file in temp_s3_files:
    item_id = os.path.basename(temp_s3_file)
    inserted_item = my_collection.get_item(item_id)
    assert inserted_item, "Item was not inserted"
    print (f"Saved item in the catalog:\n{json.dumps (inserted_item.to_dict(), indent=2)}")

#### Search items from the catalog

In [None]:
# For searching, we need to prefix our collection name by <owner_id>_
owner_collection = f"{stac_client.owner_id}_{COLLECTION}"

# Search by the last inserted item id
search = stac_client.search(ids=[item_id], collections=[owner_collection])
results = list(search.items_as_dicts())
assert results, f"There should be at least one item with id={item_id}"
print(f"Found {len(results)} results for id={item_id}")

# Search by the 'width' and 'height' property using a CQL2 filter, 
# see: https://pystac-client.readthedocs.io/en/stable/tutorials/cql2-filter.html
filter_on_dimensions = {
    "op": "and",
    "args": [
        {"op": "=", "args" : [{"property": "collection"}, owner_collection]},
        {"op": "=", "args" : [{"property": "width"}, WIDTH]},
        {"op": "=", "args" : [{"property": "height"}, HEIGHT]},
    ]
}
search = stac_client.search(filter=filter_on_dimensions)
results = list(search.items_as_dicts())
assert results, f"There should be at least one item for width={WIDTH} height={HEIGHT}"
print(f"\nFound {len(results)} results for width={WIDTH} height={HEIGHT}")
for result in results:
    print(f"({result['collection']}) {result['id']}")

#### Remove an item from the collection

In [None]:
# Get all items before removing
items_before = list(stac_client.get_collection(COLLECTION).get_items())
print (f"{len(items_before)} items before removing")

# If there is at least one item
if items_before:

    # Remove the first item
    item_id = items_before[0].id
    stac_client.remove_item (COLLECTION, item_id)
    
    # We should have one less item in the collection
    items_after = list(stac_client.get_collection(COLLECTION).get_items())
    assert len(items_after) == (len(items_before) - 1), \
        f"There should be {len(items_before) - 2} items in the collection, but we have {len(items_after)}"

#### Remove a collection from the catalog

In [None]:
import pystac_client

# Remove the collection
#stac_client.remove_collection (COLLECTION)

# It should not exist anymore: trying to get the collection should raise an Exception
try:
    stac_client.get_collection(COLLECTION+"_")
    assert False, f"The collection {COLLECTION!r} should have been removed"

# So it is normal that we have this exception
except pystac_client.exceptions.APIError:
    print (f"The collection {COLLECTION!r} has been removed")

<div class="alert alert-info" role="alert">

### Search Cadip sessions

All Cadip chunk files are attached to a single Cadip session.

We can search Cadip sessions by parameters, or find information about a specific session ID.

[Back to top](#Contents)

</div>

In [None]:
# Search cadip sessions by date interval and platforms
platforms = [EPlatform.S1A, EPlatform.S2B]
sessions = cadip_client.search_sessions(start_date=start_date, stop_date=stop_date, platforms=platforms)

session_count = len(sessions)
assert session_count, "We should have at least one Cadip session"
print (f"Found {session_count} Cadip sessions")

In [None]:
# Print the first cadip session metadata. It is in the STAC format.
print(f"First Cadip session:\n{json.dumps(sessions[0], indent=2)}\n")

In [None]:
# Print all the Cadip session IDs
ids=[s["id"] for s in sessions]
print_ids="\n".join(ids)
print(f"Cadip sessions IDs:\n{print_ids}")

# Save the first session ID for later
first_cadip_session = ids[0]

In [None]:
# We can also search Cadip sessions by specific session IDs,
# e.g. get information for the cadip sessions #2 and #3
search_ids=ids[1:3]
search_sessions = cadip_client.search_sessions(session_ids=search_ids)
print(f"Cadip sessions information:\n{json.dumps(search_sessions, indent=2)}\n")

<div class="alert alert-danger" role="alert">

### Exercises: call services manually

</div>

Run again the previous cells but this time: 
1. Check if your API key allows you to access other Cadip stations, and STAC catalog collections from other owners.
    1. Use a RsClient instance with one of these other stations and owner IDs.
    1. Check that you can read and download Cadip files.
    1. Check that this owner ID is used in the STAC catalog.
1. Search Auxip and Cadip files using a different time interval.
1. Stage at least one different Auxip and Cadip file into the STAC catalog.
1. Print one different Cadip session information.

**NOTE**: you can also use the website OpenAPI Swagger UI to call RS-Server (see cell below).

[Back to top](#Contents)

<hr>

In [None]:
if local_mode:
    print(f"""OpenAPI Swagger UI for:
  - Auxip: {RSPY_HOST_AUXIP}
  - Cadip: {RSPY_HOST_CADIP}
  - STAC catalog: {RSPY_HOST_CATALOG}""")
else:
    print(f"OpenAPI Swagger UI: {generic_client.rs_server_href}")

<div class="alert alert-info" role="alert">

## Prefect workflows

Prefect is a workflow orchestration tool. We will use it to:
* Stage multiple Auxip and Cadip files at once.
* Run a simulated DPR processing on the staged files, and save results in the catalog.

[Back to top](#Contents)
</div>

<div class="alert alert-info" role="alert">

### Workflow: initialisation

[Back to top](#Contents)

</div>

In [None]:
from datetime import timedelta

# The workflow will stage all files between a date interval.
# In this example, we will stage all the Cadip chunk files from the first session.
# We extract the mission and date from the session ID: <mission>_YYYYmmdd<other_info>
session_id = first_cadip_session
print(f"First Cadip session ID: {session_id!r}")
mission, date_and_other = session_id.split("_")
date = datetime.strptime (date_and_other[:8], "%Y%m%d")
start_date = date # start from midnight
stop_date = date + timedelta(days=1) # midnight the day after

print(f"Mission: {mission!r}")
print(f"Date interval: '{start_date} -> {stop_date}'")

# Note: we will miss files if the session overlaps two days. 
# We could also get the time interval from the session information 
# but the simulated data used in this notebook is not relevant.
session_info = cadip_client.search_sessions(session_ids=[session_id])
print(f"Date interval from the session information (not used): "
      f"'{session_info[0]['properties']['start_datetime']} -> {session_info[0]['properties']['end_datetime']}'")

In [None]:
# As a prerequisite, we must create manually the Auxip and Cadip 
# collections in the catalog, if they don't already exist.
from pystac_client.exceptions import APIError
from rs_workflows import staging

for client in [auxip_client, cadip_client]:
    collection_name = staging.create_collection_name(mission, client.station_name)

    # Save the collection name for later
    if client == auxip_client:
        auxip_collection = collection_name
    else:
        cadip_collection = collection_name

    # Try to get collection information
    try:
        stac_client.get_collection(collection_id=collection_name)
        print(f"Collection already exists: {collection_name!r}")

    # If it fails, this means that the collection doesn't exist, so create it
    except APIError:

        print(f"Create collection: {collection_name!r}")
        response = stac_client.add_collection(
            Collection(
                id=collection_name,
                description=None, # rs-client will provide a default description for us
                extent=Extent(
                    spatial=SpatialExtent(bboxes=[-180.0, -90.0, 180.0, 90.0]),
                    temporal=TemporalExtent([start_date, stop_date])
                )
            ))
        response.raise_for_status()
        stac_client.validate_all()

<div class="alert alert-info" role="alert">

### Workflow: stage Cadip chunk files

[Back to top](#Contents)

</div>

In [None]:
print(f"\nView Prefect flow runs from: {RSPY_PREFECT_URL}\n")

In [None]:
# Number of tasks to be run in parallel
MAX_WORKERS = 15

# Staging workflow configuration
config = staging.PrefectFlowConfig(
    cadip_client, 
    mission, 
    s3_path = f"s3://{TEMP_S3_BUCKET}/{cadip_client.owner_id}/{cadip_client.station_name}",
    tmp_download_path=None, # no local download
    max_workers=MAX_WORKERS,
    start_datetime=start_date,
    stop_datetime=stop_date,
    limit=None) # no limit on the number of files

# Start the prefect flow
staging.staging_flow(config)

In [None]:
# Validate that our catalog is valid to the STAC format.
stac_client.validate_all()

In [None]:
# Find the staged Cadip files in the STAC catalog

# For searching, we need to prefix our collection name by <owner_id>_
owner_collection = f"{stac_client.owner_id}_{cadip_collection}"

# Use a cql2 filter to search by session ID
filter_on_session = {
    "op": "and",
    "args": [
      {"op": "=", "args": [{"property": "collection"}, owner_collection]},
      {"op": "=", "args": [{"property": "cadip:session_id"}, session_id]}
    ]
}

search = stac_client.search(filter=filter_on_session)
results = list(search.items_as_dicts())
assert len(results) > 0, f"At least one Cadip files should be staged for session ID: {session_id!r}"
print (f"\n{len(results)} Cadip files are staged for session ID: {session_id!r}.")
print (f"First one:\n{json.dumps (results[0], indent=2)}")

<div class="alert alert-info" role="alert">

### Workflow: stage Auxip files

We need to pass Auxip files to the DPR processing. They must be staged into the catalog.

As, for now, the DPR processing is only a simulation, we can pass any Auxip files.

[Back to top](#Contents)

</div>

In [None]:
# Let's use the 3 most recent files between today and any old date
files = auxip_client.search_stations(
    datetime(year=1970, month=1, day=1), 
    datetime.today(), 
    sortby="-created",
    limit=None) # NOTE: for now "limit" is not working well with "sortby" so don't use it

# Only keep the first 3 files
files = files[:3]

# Save the IDs = the filenames
auxip_files = [f["id"] for f in files]
print_ids = "\n".join(auxip_files)
print(f"Auxip files: \n{print_ids}")

# Save the min and max dates for these 3 files
dates = [datetime.strptime (f["properties"]["created"], "%Y-%m-%dT%H:%M:%S.%fZ") for f in files]
start_date = min(dates) - timedelta(seconds=1) # remove 1 second because the interval is exclusive
stop_date = max(dates) + timedelta(seconds=1) # add 1 second
print(f"\nDate interval: '{start_date} -> {stop_date}'")

In [None]:
# Staging workflow configuration
config = staging.PrefectFlowConfig(
    auxip_client, 
    mission, 
    s3_path = f"s3://{TEMP_S3_BUCKET}/{auxip_client.owner_id}/{auxip_client.station_name}",
    tmp_download_path=None, # no local download
    max_workers=MAX_WORKERS,
    start_datetime=start_date,
    stop_datetime=stop_date,
    limit=None) # no limit on the number of files

# Start the prefect flow
staging.staging_flow(config)

In [None]:
# Validate that our catalog is valid to the STAC format.
stac_client.validate_all()

In [None]:
# These 3 items should be in the STAC catalog, in the Auxip collection

# Be sure that we don't have any duplicate filenames
auxip_files = list(set(auxip_files))

# Search by ID and collection
owner_collection = f"{stac_client.owner_id}_{auxip_collection}"
search = stac_client.search(ids=auxip_files, collections=[owner_collection])
results = list(search.items_as_dicts())
assert len(results) == len(auxip_files), f"{len(results)} Auxip files were staged, we expected {len(auxip_files)}"
print(f"Staged Auxip files:\n" + "\n".join(auxip_files))
print (f"\nFirst one:\n{json.dumps (results[0], indent=2)}")

<div class="alert alert-info" role="alert">

### Workflow: DPR simulator

For now, this simulated DPR processor takes any input and writes any output.

We use it to simulate a L0 processing that takes staged Cadip chunk files and Auxip files as input, and writes raw L0 products as output.

[Back to top](#Contents)

</div>

In [None]:
from rs_workflows import s1_l0

# The product types to process can be any of these 4 values
product_types = ["S1SEWRAW", "S1SIWRAW", "S1SSMRAW", "S1SWVRAW"]

# DPR workflow configuration
config = s1_l0.PrefectS1L0FlowConfig(
    stac_client,
    RSPY_DPR_SIMU_URL,
    mission,
    session_id,
    product_types,
    auxip_files,
    s3_path = f"s3://{FINAL_S3_BUCKET}/{stac_client.owner_id}/DPR_S1L0",
    temp_s3_path = f"s3://{TEMP_S3_BUCKET}/{stac_client.owner_id}/DPR_S1L0",
)

# Start the prefect flow
s1_l0.s1_l0_flow(config)

In [None]:
# Validate that our catalog is valid to the STAC format.
stac_client.validate_all()

In [None]:
# Check output products in the STAC catalog

# The DPR collection name is hardcoded in the workflow source code
dpr_collection = f"{mission}_dpr"
dpr_products = list(stac_client.get_collection(dpr_collection).get_items())

assert len(dpr_products) > 0, f"At least one DPR product should be saved in the catalog."
print_ids = "\n".join([product.id for product in dpr_products])
print (f"\n{len(dpr_products)} DPR products are saved in the catalog:\n{print_ids}")
print (f"\nFirst one:\n{json.dumps (dpr_products[0].to_dict(), indent=2)}")

<div class="alert alert-danger" role="alert">

### Exercises: workflows

</div>

Run again the previous cells but this time: 
* Use Cadip chunk files from a different Cadip session.
* Use any other Auxip files.
* Check that Cadip and Auxip files are staged in the STAC catalog.
* Check that the simulated L0 products are staged in the STAC catalog.

[Back to top](#Contents)

<hr>
<a href="https://github.com/RS-PYTHON" target="_blank">View on GitHub</a>