## PAVICS Web Processing Services using OGC-API integration with Weaver

When [Weaver component](https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/components#weaver)
is enabled, all WPS *birds* registered as process *providers* will be automatically accessible using
[OGC-API - Processes][ogcapi-proc] interface from the endpoint where [Weaver][weaver] is defined.

[weaver]: https://github.com/crim-ca/weaver
[ogcapi-proc]: https://github.com/opengeospatial/ogcapi-processes/


In [1]:
import json
import requests
import os
import time
import urllib3

WEAVER_TEST_FQDN = os.getenv("WEAVER_TEST_FQDN", os.getenv("PAVICS_HOST", "pavics.ouranos.ca"))
WEAVER_TEST_URL = os.getenv("WEAVER_TEST_URL", "https://{}/weaver".format(WEAVER_TEST_FQDN))
WEAVER_TEST_SSL_VERIFY = str(os.getenv("WEAVER_TEST_SSL_VERIFY", "true")).lower() in ["true","1","on","yes"]
WEAVER_TEST_DEFAULT_BIRDS = "catalog, finch, flyingpigeon, hummingbird, malleefowl, raven"
WEAVER_TEST_KNOWN_BIRDS = os.getenv("WEAVER_TEST_KNOWN_BIRDS", WEAVER_TEST_DEFAULT_BIRDS)
WEAVER_TEST_KNOWN_BIRDS = list(bird.strip() for bird in WEAVER_TEST_KNOWN_BIRDS.split(","))
WEAVER_TEST_DEFAULT_FILE = "/twitcher/ows/proxy/thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily/tasmin/nrcan_canada_daily_tasmin_2013.nc"
WEAVER_TEST_FILE = os.getenv("WEAVER_TEST_FILE", "https://{}{}".format(WEAVER_TEST_FQDN, WEAVER_TEST_DEFAULT_FILE))

WEAVER_HEADERS = {"Accept": "application/json", "Content-Type": "application/json"}

if not WEAVER_TEST_SSL_VERIFY:
    urllib3.disable_warnings()

print("Variables:")
variables = [
    ("WEAVER_TEST_FQDN", WEAVER_TEST_FQDN),
    ("WEAVER_TEST_URL", WEAVER_TEST_URL),
    ("WEAVER_TEST_SSL_VERIFY", WEAVER_TEST_SSL_VERIFY),
    ("WEAVER_TEST_FILE", WEAVER_TEST_FILE),
    ("WEAVER_TEST_KNOWN_BIRDS", WEAVER_TEST_KNOWN_BIRDS),
]
max_len = max(len(var[0]) for var in variables) + 2
msg = f"  {{:{max_len}}}{{}}"
for var, val in variables:
    print(msg.format(var, val))

    
assert len(WEAVER_TEST_KNOWN_BIRDS) >= 1, "No test WPS provider provided in 'WEAVER_TEST_KNOWN_BIRDS'."


Variables:
  WEAVER_TEST_FQDN         host-140-88.rdext.crim.ca
  WEAVER_TEST_URL          https://host-140-88.rdext.crim.ca/weaver
  WEAVER_TEST_SSL_VERIFY   True
  WEAVER_TEST_FILE         https://host-140-88.rdext.crim.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily/tasmin/nrcan_canada_daily_tasmin_2013.nc
  WEAVER_TEST_KNOWN_BIRDS  ['catalog', 'finch', 'flyingpigeon', 'hummingbird', 'malleefowl', 'raven']


### Define some utility functions for displaying test results

In [2]:
def json_dump(_json):
    try:
        if isinstance(_json, str):
            _json = json.loads(_json)
        return json.dumps(_json, indent=2, ensure_ascii=False)
    except Exception:
        return str(_json)


def json_print(_json):
    print(json_dump(_json))

### Start with simple listing of registered WPS providers in Weaver


In [3]:
print("Listing WPS providers registered under Weaver...\n")

path = f"{WEAVER_TEST_URL}/providers"
resp = requests.get(path, headers=WEAVER_HEADERS, verify=WEAVER_TEST_SSL_VERIFY)
assert resp.status_code == 200, f"Error during WPS bird providers listing:\n{json_dump(resp.text)}"
body = resp.json()
json_print(body)

assert "providers" in body and len(body["providers"]), "Could not find Weaver WPS providers"
bird_ids = [bird["id"] for bird in body["providers"]]
assert all(bird in bird_ids for bird in WEAVER_TEST_KNOWN_BIRDS), "Could not find all expected Weaver WPS providers"


Listing WPS providers registered under Weaver...

{
  "providers": [
    {
      "id": "catalog",
      "title": "PyWPS Processing Service",
      "abstract": "PyWPS is an implementation of the Web Processing Service standard from the Open Geospatial Consortium. PyWPS is written in Python.",
      "url": "https://host-140-88.rdext.crim.ca/weaver/providers/catalog",
      "public": false
    },
    {
      "id": "finch",
      "title": "Finch",
      "abstract": "A Web Processing Service for Climate Indicators.",
      "url": "https://host-140-88.rdext.crim.ca/weaver/providers/finch",
      "public": false
    },
    {
      "id": "flyingpigeon",
      "title": "Flyingpigeon",
      "abstract": "A Web Processing Service Testbed.",
      "url": "https://host-140-88.rdext.crim.ca/weaver/providers/flyingpigeon",
      "public": false
    },
    {
      "id": "hummingbird",
      "title": "Hummingbird 0.5_dev",
      "abstract": "WPS processes for general tools used in the climate science c

### Obtain OGC-API converted WPS processes by Weaver from original WPS providers endpoints

For each registered provider, Weaver sends a *GetCapabilities* WPS request to the remote endpoint and parses
the XML result in order to form the corresponding OGC-API JSON content.

In [4]:
print("Listing WPS provider processes converted to OGC-API interface by Weaver:\n")

process_locations = []
for bird in bird_ids:
    path = f"{WEAVER_TEST_URL}/providers/{bird}/processes"
    resp = requests.get(path, headers=WEAVER_HEADERS, verify=WEAVER_TEST_SSL_VERIFY)
    assert resp.status_code == 200, f"Error during WPS bird processes retrieval:\n[{json_dump(resp.text)}]"
    body = resp.json()
    for process in body["processes"]:
        process_desc_url = f"{path}/{process['id']}"
        process_locations.append(process_desc_url)
        print(" -", process_desc_url)
assert len(process_locations), "Could not find any process!"

Listing WPS provider processes converted to OGC-API interface by Weaver:

 - https://host-140-88.rdext.crim.ca/weaver/providers/catalog/processes/getpoint
 - https://host-140-88.rdext.crim.ca/weaver/providers/catalog/processes/ncplotly
 - https://host-140-88.rdext.crim.ca/weaver/providers/catalog/processes/pavicrawler
 - https://host-140-88.rdext.crim.ca/weaver/providers/catalog/processes/pavicsearch
 - https://host-140-88.rdext.crim.ca/weaver/providers/catalog/processes/pavicsupdate
 - https://host-140-88.rdext.crim.ca/weaver/providers/catalog/processes/pavicsvalidate
 - https://host-140-88.rdext.crim.ca/weaver/providers/catalog/processes/period2indices
 - https://host-140-88.rdext.crim.ca/weaver/providers/catalog/processes/pavicstestdocs
 - https://host-140-88.rdext.crim.ca/weaver/providers/finch/processes/tg
 - https://host-140-88.rdext.crim.ca/weaver/providers/finch/processes/wind_speed_from_vector
 - https://host-140-88.rdext.crim.ca/weaver/providers/finch/processes/wind_vector_fr

### Dispatched execution of Flyingpigeon WPS process

Here, we attempt running the same process defined in [WPS_example Notebook](../notebooks/WPS_example.ipynb), but
through the OGC-API interface provided by Weaver.

The process execution received by Weaver gets dispatched to the real WPS location. Weaver then
monitors the process until completion and, once completed, returns the location where results can be retrieved.

In [5]:
assert "hummingbird" in WEAVER_TEST_KNOWN_BIRDS, (
    "Hummingbird not specified within known WPS provider birds by Weaver. Cannot test dispatched process execution..."
)

WEAVER_BIRD_URL = f"{WEAVER_TEST_URL}/providers/hummingbird"
WEAVER_BIRD_PROCESS_URL = f"{WEAVER_BIRD_URL}/processes/ncdump"
assert WEAVER_BIRD_PROCESS_URL in process_locations, (
    f"Could not find WPS bird process URL to test execution [{WEAVER_BIRD_PROCESS_URL}]."
)

print(f"Will run process: [{WEAVER_BIRD_PROCESS_URL}]")


Will run process: [https://host-140-88.rdext.crim.ca/weaver/providers/hummingbird/processes/ncdump]


#### First let's obtain the specific description of the test WPS process

This request will tell us the explicit details of the process such as its inputs, outputs, and other metadata.
Weaver parses the results retrieved from the original WPS provider using *DescribeProcess* request to
generate the corresponding outputs. Weaver also adds additional metadata when it can infer some missing
details from returned description fields.

In [6]:
print("Getting WPS process description...\n")

resp = requests.get(WEAVER_BIRD_PROCESS_URL, headers=WEAVER_HEADERS, verify=WEAVER_TEST_SSL_VERIFY)
assert resp.status_code == 200, f"Error getting WPS process description:\n[{resp.text}]"
body = resp.json()
json_print(body)

Getting WPS process description...



AssertionError: Error getting WPS process description:
[{"description": "The server has either erred or is incapable of performing the requested operation. Unhandled internal server error.", "code": "Internal Server Error", "error": {"code": 500, "status": "500 Internal Server Error"}}]

#### Submit the new process execution

Using OGC-API interface, WPS process execution are accomplished using a *Job*. That job will tell us the status
location where we can monitor the process execution.

From the previous response, we can see that the process accepts many inputs and format variations.
In this case, we are interested in the input named `dataset` to submit the file defined by `WEAVER_TEST_FILE`.

Following execution of the process, we expect to obtain a raw text data dump of the test file content.
The location of the raw text file is expected be provided by output named `output` according to the process description.

In [15]:
print("Submitting process job with:")
print("  File:     [{}]".format(WEAVER_TEST_FILE))
print("  Process:  [{}]".format(WEAVER_BIRD_PROCESS_URL))

data = {
  "mode": "async",  # This tells Weaver to run the process asynchronously, such that we get non-blocking status location
  "response": "document",  # Type of status response (only this mode supported for the time being)
  "inputs": [
    {
      "id": "dataset",  # The target input
      "href": WEAVER_TEST_FILE
    }
  ],
  "outputs": [
    {
      "id": "output",   # Target output we want to retrieve
      "transmissionMode": "reference"  # Ask to provide the result as HTTP reference
    }
  ]
}

path = f"{WEAVER_BIRD_PROCESS_URL}/jobs"
resp = requests.post(path, json=data, headers=WEAVER_HEADERS, verify=WEAVER_TEST_SSL_VERIFY)
assert resp.status_code in [200, 201], f"Error during WPS job submission:\n{resp.text}"
status_location = resp.headers.get("Location")
assert status_location, "Could not find status location URL"
print(f"Job Status Location: [{status_location}]")


Submitting process job with:
  File:     [https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/nrcan/nrcan_canada_daily/tasmin/nrcan_canada_daily_tasmin_2013.nc]
  Process:  [https://pavics.ouranos.ca/weaver/providers/hummingbird/processes/ncdump]
Job Status Location: [https://pavics.ouranos.ca/weaver/providers/hummingbird/processes/ncdump/jobs/11b4f3fc-e108-41ce-88cf-3ab847f21706]


#### Monitor execution until completion

Now, we wait until the process completes by periodically verifying the provided status location of the job.
The job will be running asynchronously and will be gradually updated with progression and logging details.

Following job submission request, the `status` can be either `accepted` if it is still in queue pending execution, or
already be `running`. Once the job completes, the `status` should indicate it was either `succeeded` or `failed`.


In [10]:
# NBVAL_IGNORE_OUTPUT
# ignore status updates of job monitoring

print("Waiting for job completion with pooling monitoring of its status...")

timeout = 60  # Define a timeout to abandon this monitoring. Process is relatively quick and shouldn't last too long.
delta = 5
body = {}
while timeout >= 0:
    resp = requests.get(status_location, headers=WEAVER_HEADERS, verify=WEAVER_TEST_SSL_VERIFY)
    assert resp.status_code == 200, "Failed retrieving job status at location [{}]".format(status_location)
    body = resp.json()
    timeout -= delta
    if body["status"] in ["accepted", "running"]:
        print(f"Delay: {delta}s, Duration: {body['duration']}, Status: {body['status']}")
        time.sleep(delta)
        continue
    if body["status"] in ["failed", "succeeded"]:
        break
    raise ValueError(f"Unhandled job status during monitoring: [{body['status']}]")

assert body and "status" in body, f"Could not retrieve job status [{status_location}]"
status = body["status"]

Waiting for job completion with pooling monitoring of its status...
Delay: 5s, Duration: 0:27:23, Status: accepted
Delay: 5s, Duration: 0:27:29, Status: running
Delay: 5s, Duration: 0:27:34, Status: running
Delay: 5s, Duration: 0:27:39, Status: succeeded


#### Obtain job execution logs

Retrieve job logs listing execution steps accomplished by Weaver and the underlying process if it provided
status messages. During job execution, Weaver attempts to collect any output the original WPS produces and
integrates them within its own job logs in order to generate sequential chain of log events by each executed steps.

In case the job `failed` execution, this log will help us identify the cause of the problem.
Otherwise, we will have a summary of processing steps.

**NOTE**:

> Job logs is a feature specific to Weaver that is not necessarily implemented by other implementations
  of [OGC-API - Processes](https://github.com/opengeospatial/ogcapi-processes/).


In [8]:
print("Obtaining job logs from execution...")

path = f"{status_location}/logs"
resp = requests.get(path, headers=WEAVER_HEADERS, verify=WEAVER_TEST_SSL_VERIFY)
assert resp.status_code == 200, f"Failed to retrieve job logs [{path}]"
logs = resp.json()

log_lines = "\n".join(logs)
print(f"Job logs retrieved from [{path}]:\n\n{log_lines}")

assert status == "succeeded", "Job execution was not successful"


Obtaining job logs from execution...
Job logs retrieved from [https://pavics.ouranos.ca/weaver/providers/hummingbird/processes/ncdump/jobs/f6c42ea8-830c-4295-930f-29dc37fdbc10/logs]:




AssertionError: Job execution was not successful

#### Obtain the result location and output the data

When job is `succeeded`, the result endpoint under the corresponding job will provide the downloadable file references
for each of the available output ID defined by the WPS process.

Since the sample NetCDF file provided as input is expected to be converted to raw text data, it can be displayed below.

In [None]:
print("\nJob was successful! Retrieving result location...")

# NOTE:
#   Path 'result' becomes 'results' in later versions and should be employed for same interface as OGC-API - Processes
#   It is preserved here for backward compatibility.
path = f"{status_location}/result"
resp = requests.get(path, headers=WEAVER_HEADERS, verify=WEAVER_TEST_SSL_VERIFY)
assert resp.status_code == 200, f"Failed to retrieve job results location [{path}]"
body = resp.json()

# Here, our target output ID is named 'output' according to the process description
output = list(filter(lambda out: out["id"] == "output", body.get("outputs", [])))
assert len(output) == 1, f"Could not find result matching ID 'output' within:\n{body}"
href = output[0]["href"]
assert isinstance(href, str) and href.startswith("https://"), "Output result does not have expected reference format"

resp = requests.get(href)
print(f"\nNCDUMP results:\n\n{resp.text}")

