RSPY DPR Processor mockup demo:


The DPRProcessor is a class that simulates processing part performed by eopf-cpm triggering module. The input of the processor it's a yaml config file with all the input files and expected outputs locations (local or s3).

The implemented mockup performs the following actions:
1. Check the validity of input yaml file (chunks/aux existance / naming convention)
2. Downloads the zarr input from public s3 ovh based on product type required in payload yaml.
3. Updates the .zattrs with our processor name (RSPY_DprMockupProcessor) and timestamp (if product is zipped, our processor updates zattrs inside .zip without extracting files)
4. Computes the CRC of updated .zattrs
5. Update product name VVV (as per EOPF-CPM PSD) with computed CRC, in order to call processor multiple times with same input and generated different outputs.
6. Uploads the products to s3 server (minio for this demo).
7. Removes the local downloaded products (if a flag is set).
8. Retrieves the .zattrs into a serialisable format (dict) in order to upload catalog in the future step of our processing chain.

In [None]:
!pip install boto3
import requests
import json
import yaml
import pprint

In [None]:
yaml_payload = """
general_configuration:
  logging:
    level: DEBUG
  triggering__validate_run: true
  triggering__use_default_filename: true
  triggering__use_basic_logging: true
  triggering__load_default_logging: false
breakpoints:
workflow:
- step: 1
  active: true
  module: rs.dpr.mockup # string corresponding to the python path of the module
  processing_unit: DprMockupProcessor # EOProcessingUnit class name
  name: DprMockupProcessor # identifier for the processing unit
  inputs:
    in1: CADU1
    in2: CADU2 # One CADU{N} entry by CADU chunk we want to pass as input. In this example we consider 2 chunks
    in3: AUX1
    in4: AUX2 # One AUX{N} entry by ADGS file we want to pass as input. In this example we consider 2 aux files
  outputs:
    out: outputs
  parameters:
    product_types: # List of EOPF product types we want to generate. In this example we simulate S1L0 processor that generates 4 products
      - S1SSMOCN
I/O:
  inputs_products:
  - id: CADU1
    path: chunk/S1/S1A_20231121072204051312/DCS_04_S1A_20231121072204051312_ch1_DSDB_00023.raw
    store_type: zarr
    store_params: {}
  - id: CADU2
    path: chunk/S1/S1A_20231121072204051312/DCS_04_S1A_20231121072204051312_ch1_DSDB_00022.raw
    store_type: zarr
    store_params: {}
  - id: AUX1
    path: AUX/S1/S1A_OPER_AMV_ERRMAT_MPC__20201124T040009_V20000101T000000_20201123T131345.EOF.zip
    store_type: zarr
    store_params: {}
  - id: AUX2
    path: AUX/S1/S1A_AUX_PP2_V20190228T092500_G20220228T120000.SAFE.zip
    store_type: zarr
    store_params: {}
  output_products:
  - id: outputs
    path: src/DPR/data/ # output folder or S3 bucket
    type: folder
    store_type: zarr
    store_params: {}
dask_context: {}
logging: {}
config: {}
"""

---
**NOTE**

You can also monitor the s3 bucket using the minio console: http://127.0.0.1:9001/browser with:

  * Username: _minio_
  * Password: _Strong#Pass#1234_

---

In [None]:
# We'll use boto3 to monitor the s3 bucket.
# Note: the S3_ACCESSKEY, S3_SECRETKEY and S3_ENDPOINT are given in the docker-compose.yml file.
import boto3
import os

s3_session = boto3.session.Session()
s3_client = s3_session.client(
    service_name="s3",
    aws_access_key_id=os.environ["S3_ACCESSKEY"],
    aws_secret_access_key=os.environ["S3_SECRETKEY"],
    endpoint_url=os.environ["S3_ENDPOINT"],
    region_name=os.environ["S3_REGION"],
)
bucket_name = "test-data"
bucket_dir = "zarr/dpr_processor_output"
bucket_url = f"s3://{bucket_name}/{bucket_dir}"

# If bucket is already created, clear all files in order to start fresh for each demo. 
if bucket_name in [bucket["Name"] for bucket in s3_client.list_buckets()["Buckets"]]:
    if 'Contents' in s3_client.list_objects(Bucket=bucket_name):
        objects = s3_client.list_objects(Bucket=bucket_name)['Contents']
        for obj in objects:
            # clear up the bucket
            s3_client.delete_object(Bucket=bucket_name, Key=obj['Key'])
else:
    s3_client.create_bucket(Bucket=bucket_name)

print("Is bucket empty now ?: ", 'Contents' not in s3_client.list_objects(Bucket=bucket_name))

Convert yaml to json in order to post it over HTTP and call the simulator webserver endpoint.
The output of run() method is a list of all stac-comptabile .zattrs.

In [None]:
yaml_data = yaml.safe_load(yaml_payload)
json_data = json.dumps(yaml_data)

dpr_simulator_endpoint = "http://dpr-simulator:8000/run" # rs-server host = the container name
response = requests.post(dpr_simulator_endpoint, json=yaml_data)

pp = pprint.PrettyPrinter(indent=4)
for attr in response.json():
    pp.pprint(attr)

In [None]:
s3_client.list_objects(Bucket=bucket_name)['Contents']