<div class="alert alert-block alert-success">
<h1>RS-Server / RS-Client tutorial for the ESA checkpoint 0.1</h1></div>

## Introduction

In this notebook we will see how to stage files, save them in the catalog, do it from a Prefect flow, run a simulated S1L0 processing from a Prefect flow  ... 
<mark>TODO: link to existing documentation that already explains these concepts ? Or copy/paste the documentation here ? 
Or write a simplified documentation here ?</mark>

## <a id='TOC_TOP'></a>Contents

</div>
    
 1. [Check your installation](#check_your_installation)
 1. [RsClient initialisation](#rsclient_initialisation)
 1. [Call services manually](#call_services_manually)

<hr>

<div class="alert alert-info" role="alert">

## <a id='check_your_installation'></a>Check your installation
[Back to top](#TOC_TOP)

</div>

### `rs-client-libraries` installation

The `rs-client-libraries` Python library is the preferred way to access the RS-Server services from your environment. It is automatically installed in this notebook.

In [None]:
import rs_client
import rs_common
import rs_workflows

### Environment

In [None]:
import os

# In local mode, all your services are running locally.
# In hybrid or cluster mode, we use the services deployed on the RS-Server website.
# This configuration is set in an environment variable.
local_mode = (os.getenv("RSPY_LOCAL_MODE") == "1")

# In local mode, print the services URL
if local_mode:
    print (f"ADGS service: http://localhost:8001/docs")
    print (f"CADIP service: http://localhost:8002/docs")
    print (f"Catalog service: http://localhost:8003/api.html")
    print (f"MinIO dashboard (object storage): http://localhost:9101 with user=minio password=Strong#Pass#1234")
    print (f"Prefect dashboard (orchestrator): http://localhost:4200")
    print (f"Grafana dashboard (logs, traces, metrics): http://localhost:3000/explore")
    url = None # not used

# In hybrid or cluster mode, the RS-Server website is set in an environment variable.
else:
    url = os.environ["RSPY_WEBSITE"]
    print (f"RS-Server website: {url}")
    print (f"Create an API key: {url}/docs#/API-Key%20Manager/create_api_key_apikeymanager_auth_api_key_new_get")

### S3 buckets (object storage)

In [None]:
# Temp and final s3 buckets used by RS-Server
TEMP_S3_BUCKET = "rs-cluster-temp"
FINAL_S3_BUCKET = "rs-cluster-catalog"

# In local mode, we need to create them manually in the local minio object storage
if local_mode:
    !pip install boto3
    from resources.utils import create_s3_buckets
    create_s3_buckets(TEMP_S3_BUCKET, FINAL_S3_BUCKET)

# In hybrid or cluster mode, the buckets already exist

---
**<mark>TO BE DISCUSSED</mark>**

In local mode, is it a good advice to tell the end-users to go to the MinIO, Prefect and Grafana dashboard ?

Same question in hybrid/cluster mode, should we give these links and how to pass them ? (env variables ?)

---

### API key

In hybrid and cluster mode, you need an API key to access the RS-Server services. You can create one from the link displayed in the previous cell, then enter it manually in the cell below. 

If you prefer to load it automatically in all your notebooks, you can: 

  * From your JupyterHub workspace, open the text file `~/.rspy` <mark>(TODO: name to be defined)</mark>
  * Save your API key using this syntax:

    ```bash
    export RSPY_APIKEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx # replace by your value
    ```

  * Save and close the file.
  * <mark>TO BE CONFIRMED: Reload your JupyterHub session from Menu -> File -> Log Out / or just</mark>
  * <mark>Reload this notebook kernel from Menu -> Kernel -> Restart Kernel.</mark>

In [None]:
apikey = os.getenv("RSPY_APIKEY")
if (not local_mode) and (not apikey):
    import getpass
    apikey = getpass.getpass(f"Enter your API key:")

<div class="alert alert-info" role="alert">

## <a id='rsclient_initialisation'></a>RsClient initialisation
[Back to top](#TOC_TOP)

We are using Python RsClient instances to access the RS-Server services.

</div>

In [None]:
from rs_client.rs_client import RsClient
from rs_common.config import ECadipStation

# Init a generic RS-Client instance. Pass the:
#   - RS-Server website URL
#   - API key
#   - Logger (optional, a default one can be used)
generic_client = RsClient(url, apikey, logger=None)

# From this generic instance, get an Auxip client instance
auxip_client = generic_client.get_auxip_client()

# Or get a Cadip client instance. Pass the cadip station.
cadip_station = ECadipStation.CADIP
cadip_client = generic_client.get_cadip_client(cadip_station)

# Or get a Stac client to access the catalog
stac_client = generic_client.get_stac_client()

<div class="alert alert-info" role="alert">

## <a id='call_services_manually'></a>Call services manually
[Back to top](#TOC_TOP)

In this section, we will see how to call manually these services: 

  * Search Auxip and Cadip stations for new files
  * Stage these files
  * Check the staging status
  * Search Cadip sessions

</div>

In [None]:
from datetime import datetime
import json
from time import sleep

from rs_common.config import EDownloadStatus, EPlatform

# Define a search interval
start_date = datetime(2014, 1, 1, 12, 0, 0)
stop_date = datetime(2024, 1, 1, 12, 0, 0)

# Timeout in seconds for the endpoints
TIMEOUT = 30

### Search Cadip sessions

In [None]:
# Search cadip sessions by date interval and platforms
platforms = [EPlatform.S1A, EPlatform.S2B]
sessions = cadip_client.search_sessions(TIMEOUT, start_date=start_date, stop_date=stop_date, platforms=platforms)

session_count = len(sessions)
assert session_count, "We should have at least one Cadip session"
print (f"Found {session_count} Cadip sessions")

In [None]:
# Print the first cadip session. It is in the STAC format.
print(f"First Cadip session:\n{json.dumps(sessions[0], indent=2)}\n")

In [None]:
# Print all the Cadip sessions ID
ids=[s["id"] for s in sessions]
print_ids="\n".join(ids)
print(f"Cadip sessions ID:\n{print_ids}")

In [None]:
# We can also search Cadip sessions by specific sessions ID, 
# rather than by date interval and platforms
# e.g. get information for the cadip sessions #2 and #3
search_ids=ids[1:3]
search_sessions = cadip_client.search_sessions(TIMEOUT, session_ids=search_ids)
print(f"Cadip sessions information:\n{json.dumps(search_sessions, indent=2)}\n")

### Search Auxip and Cadip stations

In [None]:
# Do this using the Auxip then the Cadip client
for client in [auxip_client, cadip_client]:

    # Call the service to search stations for new files in the date interval.
    files = client.search_stations(start_date, stop_date, TIMEOUT)
    
    file_count = len(files)
    assert file_count, f"We should have at least one {client.station_name} file"
    print (f"Found {file_count} {client.station_name} files\n")

    # Print the first file. It is in the STAC format.
    print(f"First {client.station_name} file:\n{json.dumps(files[0], indent=2)}\n")

    # By default, the files are returned sorted by the most recent first (by creation date)
    ids="\n".join([f"{f['properties']['created']} - {f['id']}" for f in files[:10]])
    print(f"Most recent {client.station_name} IDs and datetimes:\n{ids}\n")

In [None]:
# We can sort by +/- any property, e.g. by creation date ascending = the oldest first
for client in [auxip_client, cadip_client]:    
    files = client.search_stations(start_date, stop_date, TIMEOUT, sortby="+created")
    ids="\n".join([f"{f['properties']['created']} - {f['id']}" for f in files[:10]])
    print(f"Oldest {client.station_name} IDs and datetimes:\n{ids}\n")

### Stage Auxip and Cadip files

<mark>
TO BE DISCUSSED: The temporary bucket name is hardcoded ? Should we give it in an env var ?
There is only one temp bucket ?
</mark>

<mark>We will have conflicts if UserA and UserB stage the same file at the same time
in the same s3 location, it will be staged only once. But then when UserA pushes the file in his collection,
it will be deleted from the temp bucket, thus UserB cannot use it anymore.
</mark>

<mark>Maybe we should force UserA to use the s3 location s3://rs-cluster-temp/UserA/dirs/filename</mark>

In [None]:
staged_files = []
for client in [auxip_client, cadip_client]:

    # When searching stations, we can also limit the number of returned results.
    # Let's keep only one file.
    files = client.search_stations(start_date, stop_date, TIMEOUT, limit=1)
    assert len(files) == 1

    # We stage by filename = the file ID
    first_filename = files[0]["id"]

    # We stage this file = download it into a temporary s3 bucket (object storage)
    # before it is pushed to the catalog.
    s3_path = f"s3://{TEMP_S3_BUCKET}/{client.owner_id}/{client.station_name}"
    staged_files.append (f"{s3_path}/{first_filename}") # save it for later

    # We can also download the file locally to the server, but this is useful only in local mode
    local_path = None

    # Call the staging service
    client.staging(first_filename, TIMEOUT, s3_path=s3_path, tmp_download_path=local_path)

    # Then we can check when the staging has finished by calling the check status service
    while True:
        status = client.staging_status(first_filename, TIMEOUT)
        print (f"Staging status for {first_filename!r}: {status.value}")
        if status in [EDownloadStatus.DONE, EDownloadStatus.FAILED]:
            print("\n")
            break
        sleep(1)

### Use the STAC catalog

<mark>TODO: this part must be completed and replaced by the StacClient</mark>

#### Add a new collection

In [None]:

import requests

# Add a new collection 

COLLECTION = "my_collection"

collection = {
            "id": COLLECTION,
            "type": "Collection",
            "description": "This is my collection description",
            "stac_version": "1.0.0",
            "owner": stac_client.owner_id
        }

# Clean the existing collection, if any
requests.delete(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{COLLECTION}", **stac_client.apikey_headers, stream=True)

# Create it from new
post_response = requests.post(f"{stac_client.href_catalog}/catalog/collections", json=collection, **stac_client.apikey_headers)
post_response.raise_for_status()

# See my personal catalog information
response = requests.get(f"{stac_client.href_catalog}/catalog/catalogs/{stac_client.owner_id}", **stac_client.apikey_headers)
response.raise_for_status()
print (f"My catalog information:\n{json.dumps (json.loads (response.content), indent=2)}")

# See my collection information
response = requests.get(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{COLLECTION}", **stac_client.apikey_headers)
response.raise_for_status()
print (f"\nMy collection information:\n{json.dumps (json.loads (response.content), indent=2)}")

#### Add new items

In [None]:
# We will add one Auxip and one Cadip file that were staged from the previous steps
for s3_file in staged_files:

    # Let's use item ID = filename
    print(f"\nAdd catalog item from: {s3_file!r}")
    item_id = os.path.basename(s3_file)
    
    item = {
                "id": item_id,
                "bbox": [-94.6334839, 37.0332547, -94.6005249, 37.0595608],
                "type": "Feature",
                "geometry": {
                    "type": "Polygon",
                    "coordinates": [
                        [
                            [-94.6334839, 37.0595608],
                            [-94.6334839, 37.0332547],
                            [-94.6005249, 37.0332547],
                            [-94.6005249, 37.0595608],
                            [-94.6334839, 37.0595608],
                        ]
                    ],
                },
                "collection": COLLECTION,
                "properties": {
                    "gsd": 0.5971642834779395,
                    "width": 2500,
                    "height": 2500,
                    "datetime": "2000-02-02T00:00:00Z",
                    "proj:epsg": 3857,
                    "orientation": "nadir",
                    "owner_id": stac_client.owner_id,
                },
                "stac_extensions": [],
                "assets": {
                    "file": {
                        "href": s3_file,
                        "type": "image/tiff; application=geotiff; profile=cloud-optimized",
                        "title": "NOAA STORM COG",
                    },
                },
            }

    # WARNING: after this, the staged file is moved from the temporary to the final bucket,
    # so this cell can be run only once, or you'll need to stage the files again.
    
    # post_response = requests.post(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{COLLECTION}/items", json=item, **stac_client.apikey_headers)
    # post_response.raise_for_status()
    
    response = requests.get(f"{stac_client.href_catalog}/catalog/collections/{stac_client.owner_id}:{COLLECTION}/items/{item_id}", **stac_client.apikey_headers)
    response.raise_for_status()
    inserted_item = json.loads (response.content)
    print (f"Saved item in the catalog:\n{json.dumps (inserted_item, indent=2)}")

<mark>TODO: demonstrate the other catalog endpoints</mark>

NOTE: also use the endpoints from the SWAGGER