# ADGS endpoints demo

In this demo we will call the rs-server ADGS HTTP endpoints:

  * List available ADGS products
  * Download some products into local storage and S3 bucket
  * Monitor the download status from the database.

## Quick links

**Swagger UI**

  * http://localhost:8001/docs (local)
  * https://dev-rspy.esa-copernicus.eu (cluster)

In [1]:
# Set local or cluster configuration
import os

if os.getenv("RSPY_LOCAL_MODE") == "1":
    RS_SERVER_ROOT_URL = "http://rs-server-adgs:8000"
    HEADERS={}
    local_mode = True
else:
    RS_SERVER_ROOT_URL = os.environ["RSPY_WEBSITE"]
    HEADERS={"headers": {"x-api-key": os.environ["RSPY_APIKEY"]}}
    local_mode = False

print(f"Using: {RS_SERVER_ROOT_URL}")

# Define some variables
endpoint=f"{RS_SERVER_ROOT_URL}/adgs/aux"
datetime="2014-01-01T12:00:00Z/2023-12-30T12:00:00Z"

Using: http://rs-server-adgs:8000


In [2]:
import requests
import json

# Call the "search" endpoint
print (f"Call: '{endpoint}/search' with: datetime={datetime!r}")
payload = {
    "datetime":datetime,    
    "limit": 3
}
data = requests.get(f"{endpoint}/search", payload, **HEADERS)
data.raise_for_status()

# Get the returned products as (id,name) lists
products = data.json()["features"]
assert len(products) == 3

# Print the first n products
print ("Result:")
print(json.dumps(products[:3], indent=2))
print("...")

# Keep only the names
product_names = []
for product in products:
    product_names.append(product["id"])

Call: 'http://rs-server-adgs:8000/adgs/aux/search' with: datetime='2014-01-01T12:00:00Z/2023-12-30T12:00:00Z'
Result:
[
  {
    "stac_version": "1.0.0",
    "stac_extensions": [
      "https://stac-extensions.github.io/file/v2.1.0/schema.json"
    ],
    "type": "Feature",
    "id": "S2__OPER_AUX_ECMWFD_PDMC_20230216T120000_V20190217T090000_20190217T210000.TGZ",
    "geometry": null,
    "properties": {
      "adgs:id": "id3",
      "datetime": "2023-02-17T09:00:00.000Z",
      "start_datetime": "2023-02-17T09:00:00.000Z",
      "end_datetime": "2023-02-17T21:00:00.000Z",
      "created": "2023-02-16T12:00:00.000Z"
    },
    "links": [],
    "assets": {
      "file": {
        "file:size": 8326253
      }
    }
  },
  {
    "stac_version": "1.0.0",
    "stac_extensions": [
      "https://stac-extensions.github.io/file/v2.1.0/schema.json"
    ],
    "type": "Feature",
    "id": "S2__OPER_AUX_ECMWFD_PDMC_20200216T120000_V20190217T090000_20190217T210000.TGZ",
    "geometry": null,
    "p

In [3]:
# We can also take one product only
# Call the "search" endpoint
print (f"Call: '{endpoint}/search' with: datetime={datetime!r}&limit=1")
payload = {
    "datetime": datetime,   
    "limit": 1, 
}
data = requests.get(f"{endpoint}/search", payload, **HEADERS)
data.raise_for_status()

# Get the returned products as (id,name) lists
products = data.json()["features"]
assert len(products) == 1

# Print the result
print ("Result:")
print(json.dumps(products, indent=2))
print("...")

Call: 'http://rs-server-adgs:8000/adgs/aux/search' with: datetime='2014-01-01T12:00:00Z/2023-12-30T12:00:00Z'&limit=1
Result:
[
  {
    "stac_version": "1.0.0",
    "stac_extensions": [
      "https://stac-extensions.github.io/file/v2.1.0/schema.json"
    ],
    "type": "Feature",
    "id": "S2__OPER_AUX_ECMWFD_PDMC_20190216T120000_V20190217T090000_20190217T210000.TGZ",
    "geometry": null,
    "properties": {
      "adgs:id": "2b17b57d-fff4-4645-b539-91f305c27c69",
      "datetime": "2019-02-17T09:00:00.000Z",
      "start_datetime": "2019-02-17T09:00:00.000Z",
      "end_datetime": "2019-02-17T21:00:00.000Z",
      "created": "2019-02-16T12:00:00.000Z"
    },
    "links": [],
    "assets": {
      "file": {
        "file:size": 8326253
      }
    }
  }
]
...


In [4]:
# The "search" endpoint has initialised the database with the products info.
# Call the "status" endpoint to get the info from the products name.
all_status = []
print (f"Call: '{endpoint}/status' with: name='...'")
for name in product_names:
    data = requests.get(f"{endpoint}/status", {"name": name}, **HEADERS)
    data.raise_for_status()
    all_status.append (data.json())

# Print the first n status
print ("Result:")
print(json.dumps(all_status[:5], indent=2))
print("...")

Call: 'http://rs-server-adgs:8000/adgs/aux/status' with: name='...'
Result:
[
  {
    "product_id": "id3",
    "name": "S2__OPER_AUX_ECMWFD_PDMC_20230216T120000_V20190217T090000_20190217T210000.TGZ",
    "available_at_station": "2023-02-16T12:00:00.000000",
    "db_id": 3,
    "download_start": null,
    "download_stop": null,
    "status": "NOT_STARTED",
    "status_fail_message": null
  },
  {
    "product_id": "id2",
    "name": "S2__OPER_AUX_ECMWFD_PDMC_20200216T120000_V20190217T090000_20190217T210000.TGZ",
    "available_at_station": "2020-02-16T12:00:00.000000",
    "db_id": 2,
    "download_start": null,
    "download_stop": null,
    "status": "NOT_STARTED",
    "status_fail_message": null
  },
  {
    "product_id": "2b17b57d-fff4-4645-b539-91f305c27c69",
    "name": "S2__OPER_AUX_ECMWFD_PDMC_20190216T120000_V20190217T090000_20190217T210000.TGZ",
    "available_at_station": "2019-02-16T12:00:00.000000",
    "db_id": 1,
    "download_start": null,
    "download_stop": null,
    

---
**NOTE**

In local mode, you can also monitor the database using pgAdmin.

---

In [5]:
# We'll use boto3 to monitor the s3 bucket.
# Note: the S3_ACCESSKEY, S3_SECRETKEY and S3_ENDPOINT are given 
# in the docker-compose.yml or ~/.s3cfg file.
!pip install boto3
import boto3
import os

s3_session = boto3.session.Session()
s3_client = s3_session.client(
    service_name="s3",
    aws_access_key_id=os.environ["S3_ACCESSKEY"],
    aws_secret_access_key=os.environ["S3_SECRETKEY"],
    endpoint_url=os.environ["S3_ENDPOINT"],
    region_name=os.environ["S3_REGION"],
)



In [6]:
# S3 bucket name and sub-directories
bucket_name = "test-data"
bucket_dir = "adgs/data"

# Full bucket name + subdirs
bucket_url = f"s3://{bucket_name}/{bucket_dir}"

# The local download directory is passed as an environment variable
if local_mode:
    from pathlib import Path
    local_download_dir = Path (os.environ["RSPY_WORKING_DIR"]) / bucket_dir
else:
    local_download_dir = ""

# Clean existing files
def clean_existing():

    # If the s3 bucket already exist, remove the existing products from it
    if bucket_name in [bucket["Name"] for bucket in s3_client.list_buckets()["Buckets"]]:
        for name in product_names:
            s3_client.delete_object(Bucket=bucket_name, Key=f"{bucket_dir}/{name}")
    
    # Else create the bucket
    else:
        s3_client.create_bucket(Bucket=bucket_name)
    
    # Create the local download dif if missing
    if local_mode:
        local_download_dir.mkdir(parents=True, exist_ok=True)
        
        # Remove all local files if they exist
        for name in product_names:
            file = local_download_dir / name
            if file.is_file():
                file.unlink()

import time

# Check that the files were downloaded locally
def check_existing_local():
    
    # Wait 1 second before that or sometimes it bugs.
    time.sleep(1)
    for name in product_names:
        file = Path (local_download_dir) / name    
        if not file.is_file():
            raise RuntimeError (f"{file} is missing locally")
        print (f"{file} exists")

# Check that the files were uploaded into the S3 bucket.
# This time the local files are not kept.
def check_existing_s3():
    time.sleep(1)
    try:
        all_s3_files = [key["Key"] for key in s3_client.list_objects(Bucket=bucket_name)['Contents']]
    except KeyError:
        all_s3_files = []
    for name in product_names:
        bucket_file = f"{bucket_dir}/{name}"
        if not bucket_file in all_s3_files:
            raise RuntimeError (f"s3://{bucket_name}/{bucket_file} is missing from the S3 bucket")
        print (f"s3://{bucket_name}/{bucket_file} exists")

---
**NOTE**

In local mode, you can also monitor the s3 bucket using the minio console: http://127.0.0.1:9001/browser with:

  * Username: _minio_
  * Password: _Strong#Pass#1234_

---

In [7]:
import asyncio
from typing import Callable

print (f"Call: '{endpoint}' with: name='...' local={local_download_dir!r} obs={bucket_url!r}")

# Call the ADGS endpoint to download one product in background 
# and upload it (optional) to the S3 bucket.
async def download_one(name: str, save_to_s3: bool):

    params = {"name": name, "local": local_download_dir}
    if save_to_s3:
        params["obs"] = bucket_url

    data = requests.get(endpoint, params, **HEADERS)
    data.raise_for_status()

# Download everything in parallel
async def download_all(save_to_s3: bool, download_one: Callable=download_one):
    async with asyncio.TaskGroup() as group:
        for name in product_names:
            group.create_task(download_one (name, save_to_s3))

    #
    # In the meantime, call the "status" endpoint to get and print the download status.
    #

    all_done = False
    while not all_done: 

        # Count the number of products not started, in progres etc ...
        all_status = {"NOT_STARTED": 0, "IN_PROGRESS": 0, "FAILED": 0, "DONE": 0}
        for name in product_names:
            
            # Call the "status" endpoint
            data = requests.get(f"{endpoint}/status", {"name": name}, **HEADERS)
            data.raise_for_status()
            all_status[(data.json())["status"]] += 1

        # Print result
        print (" / ".join ([f"{status}:{count}" for status, count in all_status.items()]))

        if (all_status["DONE"] + all_status["FAILED"]) >= len(product_names):
            all_done = True
        else:
            time.sleep(1)

clean_existing()

if local_mode:
    print ("Download everything to the local directory, not s3:")
    await (download_all(save_to_s3=False))    
    check_existing_local()

print ("\nDownload everything and upload to S3:")
await (download_all(save_to_s3=True))
check_existing_s3()

Call: 'http://rs-server-adgs:8000/adgs/aux' with: name='...' local=PosixPath('/rspy/working/dir/adgs/data') obs='s3://test-data/adgs/data'
Download everything to the local directory, not s3:
NOT_STARTED:0 / IN_PROGRESS:3 / FAILED:0 / DONE:0
NOT_STARTED:0 / IN_PROGRESS:0 / FAILED:0 / DONE:3
/rspy/working/dir/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20230216T120000_V20190217T090000_20190217T210000.TGZ exists
/rspy/working/dir/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20200216T120000_V20190217T090000_20190217T210000.TGZ exists
/rspy/working/dir/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20190216T120000_V20190217T090000_20190217T210000.TGZ exists

Download everything and upload to S3:
NOT_STARTED:0 / IN_PROGRESS:3 / FAILED:0 / DONE:0
NOT_STARTED:0 / IN_PROGRESS:0 / FAILED:0 / DONE:3
s3://test-data/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20230216T120000_V20190217T090000_20190217T210000.TGZ exists
s3://test-data/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20200216T120000_V20190217T090000_20190217T210000.TGZ exists
s3://test-d

In [8]:
# Do the same with prefect
!pip install prefect
from prefect import flow, task

@task
async def download_one_with_prefect(name: str, save_to_s3: bool):
    return await download_one(name, save_to_s3)

@flow(name="download adgs products")
async def download_all_with_prefect(save_to_s3: bool):
    return await download_all(save_to_s3, download_one_with_prefect)

clean_existing()

if local_mode:
    print ("[Prefect] Download everything to the local directory, not s3:")
    await (download_all_with_prefect(save_to_s3=False))
    check_existing_local()

print ("\n[Prefect] Download everything again and upload to S3:")
await (download_all_with_prefect(save_to_s3=True))
check_existing_s3()

[Prefect] Download everything to the local directory, not s3:


NOT_STARTED:0 / IN_PROGRESS:3 / FAILED:0 / DONE:0
NOT_STARTED:0 / IN_PROGRESS:0 / FAILED:0 / DONE:3


/rspy/working/dir/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20230216T120000_V20190217T090000_20190217T210000.TGZ exists
/rspy/working/dir/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20200216T120000_V20190217T090000_20190217T210000.TGZ exists
/rspy/working/dir/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20190216T120000_V20190217T090000_20190217T210000.TGZ exists

[Prefect] Download everything again and upload to S3:


NOT_STARTED:0 / IN_PROGRESS:3 / FAILED:0 / DONE:0
NOT_STARTED:0 / IN_PROGRESS:0 / FAILED:0 / DONE:3


s3://test-data/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20230216T120000_V20190217T090000_20190217T210000.TGZ exists
s3://test-data/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20200216T120000_V20190217T090000_20190217T210000.TGZ exists
s3://test-data/adgs/data/S2__OPER_AUX_ECMWFD_PDMC_20190216T120000_V20190217T090000_20190217T210000.TGZ exists


---
**NOTE**

In local mode, open the Prefect dashboard: http://127.0.0.1:4200

---

In [9]:
from datetime import datetime

dt_format = "%Y-%m-%dT%H:%M:%S.%f" # %z

# Check timeliness by substracting download stop date - publishing date.
# Call the "status" endpoint.
print ("Timeliness for:")
for name in product_names:    
    data = requests.get(f"{endpoint}/status", {"name": name}, **HEADERS)
    data.raise_for_status()
    values = data.json()
    publication = datetime.strptime (values["available_at_station"], dt_format)
    stop = datetime.strptime (values["download_stop"], dt_format)
    timeliness = stop - publication
    print (f"  - {name}: {timeliness}")

Timeliness for:
  - S2__OPER_AUX_ECMWFD_PDMC_20230216T120000_V20190217T090000_20190217T210000.TGZ: 423 days, 20:03:48.274816
  - S2__OPER_AUX_ECMWFD_PDMC_20200216T120000_V20190217T090000_20190217T210000.TGZ: 1519 days, 20:03:48.395554
  - S2__OPER_AUX_ECMWFD_PDMC_20190216T120000_V20190217T090000_20190217T210000.TGZ: 1884 days, 20:03:48.498062
