## PAVICS Web Processing Services using OGC-API integration with Weaver

When [Weaver component](https://github.com/bird-house/birdhouse-deploy/tree/master/birdhouse/components#weaver)
is enabled, all WPS *birds* registered as process *providers* will be automatically accessible using
[OGC-API - Processes][ogcapi-proc] interface from the endpoint where [Weaver][weaver] is defined.

[weaver]: https://github.com/crim-ca/weaver
[ogcapi-proc]: https://github.com/opengeospatial/ogcapi-processes/


In [1]:
import json
import requests
import os
import time
import urllib3

WEAVER_TEST_FQDN = os.getenv("WEAVER_TEST_FQDN", os.getenv("PAVICS_HOST", "pavics.ouranos.ca"))
WEAVER_TEST_URL = os.getenv("WEAVER_TEST_URL", "https://{}/weaver".format(WEAVER_TEST_FQDN))
WEAVER_TEST_SSL_VERIFY = str(os.getenv("WEAVER_TEST_SSL_VERIFY", "true")).lower() in ["true","1","on","yes"]
WEAVER_TEST_DEFAULT_BIRDS = "catalog, finch, flyingpigeon, hummingbird, malleefowl, raven"
WEAVER_TEST_KNOWN_BIRDS = os.getenv("WEAVER_TEST_KNOWN_BIRDS", WEAVER_TEST_DEFAULT_BIRDS)
WEAVER_TEST_KNOWN_BIRDS = list(bird.strip() for bird in WEAVER_TEST_KNOWN_BIRDS.split(","))
WEAVER_TEST_DEFAULT_FILE = "/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc"
WEAVER_TEST_FILE = os.getenv("WEAVER_TEST_FILE", "https://{}{}".format(WEAVER_TEST_FQDN, WEAVER_TEST_DEFAULT_FILE))
WEAVER_TEST_WPS_OUTPUTS = f"https://{WEAVER_TEST_FQDN}/wpsoutputs"  # for validation

WEAVER_TEST_REQUEST_HEADERS = {"Accept": "application/json", "Content-Type": "application/json"}
WEAVER_TEST_REQUEST_XARGS = dict(headers=WEAVER_TEST_REQUEST_HEADERS, verify=WEAVER_TEST_SSL_VERIFY, timeout=5)

if not WEAVER_TEST_SSL_VERIFY:
    urllib3.disable_warnings()

print("Variables:")
variables = [
    ("WEAVER_TEST_FQDN", WEAVER_TEST_FQDN),
    ("WEAVER_TEST_URL", WEAVER_TEST_URL),
    ("WEAVER_TEST_WPS_OUTPUTS", WEAVER_TEST_WPS_OUTPUTS),
    ("WEAVER_TEST_SSL_VERIFY", WEAVER_TEST_SSL_VERIFY),
    ("WEAVER_TEST_FILE", WEAVER_TEST_FILE),
    ("WEAVER_TEST_KNOWN_BIRDS", WEAVER_TEST_KNOWN_BIRDS),
    ("WEAVER_TEST_REQUEST_XARGS", WEAVER_TEST_REQUEST_XARGS)
]
max_len = max(len(var[0]) for var in variables) + 2
msg = f"  {{:{max_len}}}{{}}"
for var, val in variables:
    print(msg.format(var, val))

    
assert len(WEAVER_TEST_KNOWN_BIRDS) >= 1, "No test WPS provider provided in 'WEAVER_TEST_KNOWN_BIRDS'."


Variables:
  WEAVER_TEST_FQDN           pavics.ouranos.ca
  WEAVER_TEST_URL            https://pavics.ouranos.ca/weaver
  WEAVER_TEST_WPS_OUTPUTS    https://pavics.ouranos.ca/wpsoutputs
  WEAVER_TEST_SSL_VERIFY     True
  WEAVER_TEST_FILE           https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc
  WEAVER_TEST_KNOWN_BIRDS    ['catalog', 'finch', 'flyingpigeon', 'hummingbird', 'malleefowl', 'raven']
  WEAVER_TEST_REQUEST_XARGS  {'headers': {'Accept': 'application/json', 'Content-Type': 'application/json'}, 'verify': True, 'timeout': 5}


### Define some utility functions for displaying test results

In [2]:
def json_dump(_json):
    try:
        if isinstance(_json, str):
            _json = json.loads(_json)
        return json.dumps(_json, indent=2, ensure_ascii=False)
    except Exception:
        return str(_json)


def json_print(_json):
    print(json_dump(_json))

### Start with simple listing of registered WPS providers in Weaver


In [3]:
print("Listing WPS providers registered under Weaver...\n")

path = f"{WEAVER_TEST_URL}/providers"
query = {"detail": False, "check": False}  # skip pre-fetch to obtain results quickly (they are all checked in following cells)
resp = requests.get(path, params=query, **WEAVER_TEST_REQUEST_XARGS)
assert resp.status_code == 200, f"Error during WPS bird providers listing from [{path}]:\n{json_dump(resp.text)}"
body = resp.json()
json_print(body)

assert "providers" in body and len(body["providers"]), "Could not find Weaver WPS providers"
missing = []
for bird in WEAVER_TEST_KNOWN_BIRDS:
    if bird not in body["providers"]:
        missing.append(bird)        
assert not missing, f"Could not find all expected Weaver WPS providers.\nMissing: [{missing}]\nExpected: [{WEAVER_TEST_KNOWN_BIRDS}]"
bird_ids = body["providers"]

Listing WPS providers registered under Weaver...

{
  "checked": false,
  "providers": [
    "catalog",
    "finch",
    "flyingpigeon",
    "hummingbird",
    "malleefowl",
    "raven"
  ]
}


### Obtain OGC-API converted WPS processes by Weaver from original WPS providers endpoints

For each registered provider, Weaver sends a *GetCapabilities* WPS request to the remote endpoint and parses
the XML result in order to form the corresponding OGC-API JSON content.

In [4]:
print("Listing WPS provider processes converted to OGC-API interface by Weaver:\n")

process_locations = []
for bird in WEAVER_TEST_KNOWN_BIRDS:
    path = f"{WEAVER_TEST_URL}/providers/{bird}/processes"
    resp = requests.get(path, **WEAVER_TEST_REQUEST_XARGS)
    assert resp.status_code == 200, f"Error during WPS bird processes retrieval on: [{path}]\n[{json_dump(resp.text)}]"
    body = resp.json()
    assert len(body["processes"]), f"WPS bird [{bird}] did not list any process!"
    for process in body["processes"]:
        process_desc_url = f"{path}/{process['id']}"
        process_locations.append(process_desc_url)
        print(" -", process_desc_url)

Listing WPS provider processes converted to OGC-API interface by Weaver:

 - https://pavics.ouranos.ca/weaver/providers/catalog/processes/getpoint
 - https://pavics.ouranos.ca/weaver/providers/catalog/processes/ncplotly
 - https://pavics.ouranos.ca/weaver/providers/catalog/processes/pavicrawler
 - https://pavics.ouranos.ca/weaver/providers/catalog/processes/pavicsearch
 - https://pavics.ouranos.ca/weaver/providers/catalog/processes/pavicsupdate
 - https://pavics.ouranos.ca/weaver/providers/catalog/processes/pavicsvalidate
 - https://pavics.ouranos.ca/weaver/providers/catalog/processes/period2indices
 - https://pavics.ouranos.ca/weaver/providers/catalog/processes/pavicstestdocs
 - https://pavics.ouranos.ca/weaver/providers/finch/processes/tg
 - https://pavics.ouranos.ca/weaver/providers/finch/processes/wind_speed_from_vector
 - https://pavics.ouranos.ca/weaver/providers/finch/processes/wind_vector_from_speed
 - https://pavics.ouranos.ca/weaver/providers/finch/processes/prsn
 - https://p

### Dispatched execution of Flyingpigeon WPS process

Here, we attempt running the same process defined in [WPS_example Notebook](../notebooks/WPS_example.ipynb), but
through the OGC-API interface provided by Weaver.

The process execution received by Weaver gets dispatched to the real WPS location. Weaver then
monitors the process until completion and, once completed, returns the location where results can be retrieved.

In [5]:
assert "hummingbird" in WEAVER_TEST_KNOWN_BIRDS, (
    "Hummingbird not specified within known WPS provider birds by Weaver. Cannot test dispatched process execution..."
)

WEAVER_BIRD_URL = f"{WEAVER_TEST_URL}/providers/hummingbird"
WEAVER_BIRD_PROCESS_URL = f"{WEAVER_BIRD_URL}/processes/ncdump"
assert WEAVER_BIRD_PROCESS_URL in process_locations, (
    f"Could not find WPS bird process URL to test execution [{WEAVER_BIRD_PROCESS_URL}]."
)

print(f"Will run process: [{WEAVER_BIRD_PROCESS_URL}]")


Will run process: [https://pavics.ouranos.ca/weaver/providers/hummingbird/processes/ncdump]


#### First let's obtain the specific description of the test WPS process

This request will tell us the explicit details of the process such as its inputs, outputs, and other metadata.
Weaver parses the results retrieved from the original WPS provider using *DescribeProcess* request to
generate the corresponding outputs. Weaver also adds additional metadata when it can infer some missing
details from returned description fields.

In [6]:
print("Getting WPS process description...\n")

resp = requests.get(WEAVER_BIRD_PROCESS_URL, **WEAVER_TEST_REQUEST_XARGS)
assert resp.status_code == 200, f"Error getting WPS process description:\n[{json_dump(resp.text)}]"
body = resp.json()
json_print(body)

Getting WPS process description...

{
  "id": "ncdump",
  "title": "NCDump",
  "description": "Run ncdump to retrieve NetCDF header metadata.",
  "keywords": [
    "hummingbird",
    "Hummingbird",
    "wps-remote"
  ],
  "metadata": [
    {
      "title": "Birdhouse",
      "href": "http://bird-house.github.io/",
      "rel": "birdhouse"
    },
    {
      "title": "User Guide",
      "href": "http://birdhouse-hummingbird.readthedocs.io/en/latest/",
      "rel": "user-guide"
    }
  ],
  "inputs": {
    "dataset": {
      "title": "Dataset",
      "description": "Enter a URL pointing to a NetCDF file (optional)",
      "minOccurs": 0,
      "maxOccurs": 100,
      "formats": [
        {
          "default": true,
          "mediaType": "application/x-netcdf"
        }
      ]
    },
    "dataset_opendap": {
      "title": "Remote OpenDAP Data URL",
      "description": "Or provide a remote OpenDAP data URL, for example: http://my.opendap/thredds/dodsC/path/to/file.nc",
      "minOccur

#### Submit the new process execution

Using OGC-API interface, WPS process execution are accomplished using a *Job*. That job will tell us the status
location where we can monitor the process execution.

From the previous response, we can see that the process accepts many inputs and format variations.
In this case, we are interested in the input named `dataset` to submit the file defined by `WEAVER_TEST_FILE`.

Following execution of the process, we expect to obtain a raw text data dump of the test file content.
The location of the raw text file is expected be provided by output named `output` according to the process description.

In [1]:
print("Submitting process job with:")
print("  File:     [{}]".format(WEAVER_TEST_FILE))
print("  Process:  [{}]".format(WEAVER_BIRD_PROCESS_URL))

data = {
  "mode": "async",  # This tells Weaver to run the process asynchronously, such that we get non-blocking status location
  "response": "document",  # Type of status response (only this mode supported for the time being)
  "inputs": [
    {
      "id": "dataset_opendap",  # Target input of the process
      "data": WEAVER_TEST_FILE  # Note: even though this is an URL, the expected type is a 'string' (not a 'File'), so 'data' must be used instead of 'href'
    }
  ],
  "outputs": [
    {
      "id": "output",   # Target output we want to retrieve
      "transmissionMode": "reference"  # Ask to provide the result as HTTP reference
    }
  ]
}

path = f"{WEAVER_BIRD_PROCESS_URL}/jobs"
resp = requests.post(path, json=data, **WEAVER_TEST_REQUEST_XARGS)
assert resp.status_code in [200, 201], f"Error during WPS job submission:\n{json_dump(resp.text)}"
status_location = resp.headers.get("Location")
assert status_location, "Could not find status location URL"
print(f"Job Status Location: [{status_location}]")


Submitting process job with:
  File:     [https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc]
  Process:  [https://pavics.ouranos.ca/weaver/providers/hummingbird/processes/ncdump]
Job Status Location: [https://pavics.ouranos.ca/weaver/providers/hummingbird/processes/ncdump/jobs/a4994551-4799-461a-a7bb-1e91aeace32c]


#### Monitor execution until completion

Now, we wait until the process completes by periodically verifying the provided status location of the job.
The job will be running asynchronously and will be gradually updated with progression and logging details.

Following job submission request, the `status` can be either `accepted` if it is still in queue pending execution, or
already be `running`. Once the job completes, the `status` should indicate it was either `succeeded` or `failed`.


In [1]:
# NBVAL_IGNORE_OUTPUT
# ignore status updates of job monitoring

print("Waiting for job completion with pooling monitoring of its status...")

timeout = 20  # Define a timeout to abandon this monitoring. Process is relatively quick and shouldn't last too long.
delta = 5
body = {}
while timeout >= 0:
    resp = requests.get(status_location, **WEAVER_TEST_REQUEST_XARGS)
    assert resp.status_code == 200, "Failed retrieving job status at location [{}]".format(status_location)
    body = resp.json()
    timeout -= delta
    if body["status"] in ["accepted", "running"]:
        print(f"Delay: {delta}s, Duration: {body['duration']}, Status: {body['status']}")
        time.sleep(delta)
        continue    
    if body["status"] in ["failed", "succeeded"]:
        print(f"Final job status: [{body['status']}]")
        break
    raise ValueError(f"Unhandled job status during monitoring: [{body['status']}]")
assert timeout > 0, "Timeout reached. Process job submission never finished."

# note: don't assert the process success/failure yet, to retrieve more details in case it failed
assert body and "status" in body, f"Could not retrieve job status [{status_location}]"
status = body["status"]

Waiting for job completion with pooling monitoring of its status...
Delay: 5s, Duration: 00:00:00, Status: accepted
Final job status: [succeeded]


#### Obtain job execution logs

Retrieve job logs listing execution steps accomplished by Weaver and the underlying process if it provided
status messages. During job execution, Weaver attempts to collect any output the original WPS produces and
integrates them within its own job logs in order to generate sequential chain of log events by each executed steps.

In case the job `failed` execution, this log will help us identify the cause of the problem.
Otherwise, we will have a summary of processing steps.

**NOTE**:

> Job logs is a feature specific to Weaver that is not necessarily implemented by other implementations
  of [OGC-API - Processes](https://github.com/opengeospatial/ogcapi-processes/).


In [1]:
print("Obtaining job logs from execution...")

path = f"{status_location}/logs"
resp = requests.get(path, **WEAVER_TEST_REQUEST_XARGS)
assert resp.status_code == 200, f"Failed to retrieve job logs [{path}]"
logs = resp.json()

log_lines = "\n".join(logs)
print(f"Job logs retrieved from [{path}]:\n\n{log_lines}")

assert status == "succeeded", "Job execution was not successful"


Obtaining job logs from execution...
Job logs retrieved from [https://pavics.ouranos.ca/weaver/providers/hummingbird/processes/ncdump/jobs/a4994551-4799-461a-a7bb-1e91aeace32c/logs]:

[2021-10-13 16:53:04] INFO     [weaver.datatype.Job] 00:00:00   1% accepted   Job task setup completed.
[2021-10-13 16:53:04] DEBUG    [weaver.datatype.Job] 00:00:00   2% accepted   Employed WPS URL: [https://pavics.ouranos.ca/twitcher/ows/proxy/hummingbird]
[2021-10-13 16:53:04] INFO     [weaver.datatype.Job] 00:00:00   2% accepted   Execute WPS request for process [ncdump]
[2021-10-13 16:53:04] INFO     [weaver.datatype.Job] 00:00:00   3% accepted   Fetching job input definitions.
[2021-10-13 16:53:04] INFO     [weaver.datatype.Job] 00:00:00   4% accepted   Fetching job output definitions.
[2021-10-13 16:53:05] INFO     [weaver.datatype.Job] 00:00:00   5% accepted   Starting job process execution.
[2021-10-13 16:53:05] INFO     [weaver.datatype.Job] 00:00:00   5% accepted   Following updates could take 

#### Obtain the result location and output the data

When job is `succeeded`, the result endpoint under the corresponding job will provide the downloadable file references
for each of the available output ID defined by the WPS process.

Since the sample NetCDF file provided as input is expected to be converted to raw text data, it can be displayed below.

In [1]:
print("\nJob was successful! Retrieving result location...")

# NOTE:
#   Path 'result' becomes 'results' in later versions and should be employed for same interface as OGC-API - Processes
#   It is preserved here for backward compatibility.
path = f"{status_location}/results"
resp = requests.get(path, **WEAVER_TEST_REQUEST_XARGS)
assert resp.status_code == 200, f"Failed to retrieve job results location [{path}]"
body = resp.json()

# Here, our target output ID is named 'output' according to the process description
output = body.get("output")
assert isinstance(output, dict), f"Could not find result matching ID 'output' within:\n{body}"
href = output["href"]
assert isinstance(href, str) and href.startswith(WEAVER_TEST_WPS_OUTPUTS), f"Output result location does not have expected reference format: [{href}]"
print(f"Result is located at: [{href}]\n")

print("Fetching output contents...")
resp = requests.get(href)
print(f"\nNCDUMP 'output' result content:\n\n{resp.text}")




Job was successful! Retrieving result location...
Result is located at: [https://pavics.ouranos.ca/wpsoutputs/weaver/a4994551-4799-461a-a7bb-1e91aeace32c/output.txt]

Fetching output contents...

NCDUMP 'output' result content:

<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.13.6</center>
</body>
</html>

