Weaver supports multiple type of processes, as listed below. Each one of them are accessible through the same API interface, but they have different implications.
Section _ provides multiple concrete use cases of Deploy <proc_op_deploy>
and Execute <proc_op_execute>
request payloads for diverse set of applications.
These processes come pre-packaged with Weaver. They will be available directly on startup of the application and re-updated on each boot to make sure internal database references are updated with any source code changes.
Theses processes typically correspond to utility operations. They are specifically useful when employed as step
within a Workflow process that requires data-type conversion between input/output of similar, but not perfectly, compatible definitions.
For example, process :pyweaver.processes.builtin.jsonarray2netcdf
takes a single input JSON file which its content contains an array-list of NetCDF file references, and returns them directly as the corresponding list of output files. These two different file formats (single JSON to multiple NetCDF) can then be used to map two processes with these respective output and inputs.
As of the latest release, following builtin processes are available:
- :py
weaver.processes.builtin.file2string_array
- :py
weaver.processes.builtin.jsonarray2netcdf
- :py
weaver.processes.builtin.metalink2netcdf
All builtin processes are marked with :pyweaver.processes.constants.CWL_REQUIREMENT_APP_BUILTIN
in the CWL
hints
section and are all defined in :pyweaver.processes.builtin
.
This kind of process corresponds to a traditional WPS
XML
or JSON
endpoint (depending of supported version) prior to WPS-REST specification. When the WPS-REST process is deployed in Weaver using an URL reference to an WPS-1/2 process, Weaver parses and converts the XML
or JSON
body of the response and registers the process locally using this definition. This allows a remote server offering limited functionalities (e.g.: no REST bindings supported) to provide them through Weaver.
A minimal Deploy <proc_op_deploy>
request body for this kind of process could be as follows:
{
"processDescription": {
"process": {
"id": "my-process-reference"
}
},
"executionUnit": [
{
"href": "https://example.com/wps?service=WPS&request=DescribeProcess&identifier=my-process&version=1.0.0"
}
]
}
This would tell Weaver to locally Deploy <proc_op_deploy>
the my-process-reference
process using the WPS-1 URL reference that is expected to return a DescribeProcess
XML
schema. Provided that this endpoint can be resolved and parsed according to typical WPS specification, this should result into a successful process registration. The deployed Process
would then be accessible with DescribeProcess <proc_op_describe>
requests.
The above deployment procedure can be automated on startup using Weaver's wps_processes.yml
configuration file. Please refer to Configuration of WPS Processes
section for more details on this matter.
Warning
Because Weaver creates a snapshot of the reference process at the moment it was deployed, the local process definition could become out-of-sync with the remote reference where the Execute <proc_op_execute>
request will be sent. Refer to Remote Provider section for more details to work around this issue.
This Process
type is the main component of Weaver. All other types are converted to this one either through some parsing (e.g.: WPS-1/2) or with some requirement indicators (e.g.: Builtin, Workflow) for special handling.
When deploying one such Process
directly, it is expected to have a definition specified with a CWL
Application Package. This is most of the time employed to wrap an operations packaged in a reference Docker
image. The reference package can be provided in multiple ways as presented below.
Note
When a process is deployed with any of the below supported Application Package
formats, additional parsing of this CWL
as well as complementary details directly within the WPS
deployment body is accomplished. See cwl-wps-mapping
section for more details.
In this situation, the CWL
definition is provided as is using JSON
-formatted package embedded within the _ request. The request payload would take the following shape:
{
"processDescription": {
"process": {
"id": "my-process-literal-package"
}
},
"executionUnit": [
{
"unit": {
"cwlVersion": "v1.0",
"class": "CommandLineTool",
"inputs": ["<...>"],
"outputs": ["<...>"],
"<...>": "<...>"
}
}
]
}
In this situation, the CWL
is provided indirectly using an external file reference which is expected to have contents describing the Application Package
(as presented in the app_pkg_exec_unit_literal
case). Because an external file is employed instead of embedding the package within the JSON
HTTP request contents, it is possible to employ both JSON
and YAML
definitions.
An example is presented below:
{
"processDescription": {
"process": {
"id": "my-process-reference-package"
}
},
"executionUnit": [
{
"href": "https://remote-file-server.com/my-package.cwl"
}
]
}
Where the referenced file hosted at "https://remote-file-server.com/my-package.cwl"
could contain:
cwlVersion: "v1.0"
class: CommandLineTool
inputs:
- "<...>"
outputs:
- "<...>"
"<...>": "<...>"
For traditional WPS-1 process type, Weaver adds default values to CWL
definition. As we can see in weaver/processes/wps_package.py
, the following default values for the CWL
package are:
cwl_package = OrderedDict([
("cwlVersion", "v1.0"),
("class", "CommandLineTool"),
("hints", {
CWL_REQUIREMENT_APP_WPS1: {
"provider": get_url_without_query(wps_service_url),
"process": process_id,
}}),
])
In ESGF-CWT
processes, ESGF-CWTRequirement
hint must be used instead of usual WPS1Requirement
, contained in the :pyweaver.processes.constants.CWL_REQUIREMENT_APP_WPS1
variable. The handling of this technicality is handled in weaver/processes/wps_package.py
. We can define ESGF-CWT
processes using this syntax:
{
"cwlVersion": "v1.0",
"class": "CommandLineTool",
"hints": {
"ESGF-CWTRequirement": {
"provider": "https://edas.nccs.nasa.gov/wps/cwt",
"process": "xarray.subset"
}
}
}
Processes categorized as Workflow
are very similar to WPS-REST processes. From the API standpoint, they actually look exactly the same as an atomic process when calling DescribeProcess <proc_op_describe>
or Execute <proc_op_execute>
requests. The difference lies within the referenced Application Package
which uses a CWL Workflow
instead of typical CWL CommandLineTool
, and therefore, modifies how the Process
is internally executed.
For Workflow
processes to be deploy-able and executable, it is mandatory that Weaver is configured as EMS
or HYBRID
(see: Configuration Settings
). This requirement is due to the nature of Workflow
that chain processes that need to be dispatched to known remote ADES
servers (see: conf_data_sources
and proc_workflow_ops
) according to defined Data Source
configuration.
Given that a Workflow
process was successfully deployed and that all process steps can be resolved, calling its Execute <proc_op_execute>
request will tell Weaver to parse the chain of operations and send step process execution requests to relevant ADES
picked according to Data Source
. Each step's job will then gradually be monitored from the relevant ADES
until completion.
Upon successful intermediate result, the EMS
(or HYBRID
acting as such) will stage the data references locally to chain them to the following step. When the complete chain succeeds, the final results of the last step will be provided as Workflow
output in the same manner as for atomic processes. In case of failure, the error will be indicated in the logs with the appropriate step and message where the error occurred.
Note
Although chaining sub-workflow(s) within a bigger scoped Workflow
is technically possible, this have not yet been fully explored (tested) in Weaver. There is a chance that Data Source_ resolution fails to identify where to dispatch the step in this situation. If this impacts you, please vote and indicate your concern on issue #171.
proc_workflow_ops
provides more details on each of the internal operations accomplished by individual step Process
chained in a Workflow
.
Remote provider correspond to a remote service that provides similar interfaces as supported by Weaver (WPS
-like). For example, a remote WPS
-1 XML
endpoint can be referenced as a provider. When an API Providers-scoped request is executed, for example to list is processes capabilities (see GetCapabilities <proc_op_getcap>
), Weaver will send the corresponding request using the registered reference URL to access the remote server and reply with parsed response, as if they its processes were registered locally.
Since remote providers obviously require access to the remote service, Weaver will only be able to provide results if the service is accessible with respect to standard implementation features and supported specifications.
The main advantage of using Weaver's endpoint rather than directly accessing the referenced remote provider processes is in the case of limited functionality offered by the service. For instance, WPS
-1 do not always offer proc_op_status
feature, and there is no extensive job monitoring availability. Since Weaver wraps the original reference with its own endpoints, these features indirectly become employable. Similarly, although WPS
-1 offers XML
-only endpoints, the parsing operation accomplished by Weaver makes theses services available as WPS-REST
JSON
endpoints. On top of that, registering a remote Provider
into Weaver allows the user to use it as a central hub to keep references to all his accessible services and dispatch Job
executions from a common location.
A remote provider differs from previously presented WPS-1/2 processes such that the underlying processes of the service are not registered locally. For example, if a remote service has two WPS processes, only top-level service URL will be registered locally (in Weaver's database) and the application will have no explicit knowledge of these remote processes. When calling process-specific requests (e.g.: DescribeProcess <proc_op_describe>
or Execute <proc_op_execute>
), Weaver will re-send the corresponding request directly to the remote provider each time and return the result accordingly. On the other hand, a WPS-1/2 reference would be parsed and saved locally with the response at the time of deployment. This means that a deployed WPS-1/2 reference would act as a snapshot of the reference (which could become out-of-sync), while Remote Provider will dynamically update according to the re-fetched response from the remote service. If our example remote service was extended to have a third WPS process, it would immediately be reflected in GetCapabilities <proc_op_getcap>
and DescribeProcess <proc_op_describe>
retrieved via Weaver Providers-scoped requests. This would not be the case for the WPS-1/2 reference that would need manual update (deploy the third process to register it in Weaver).
An example body of the register provider request could be as follows:
{
"id": "my-service",
"url": "https://example.com/wps",
"public": true
}
Then, processes of this registered proc_remote_provider
will be accessible. For example, if the referenced service by the above URL add a WPS process identified by my-process
, its JSON description would be obtained with following request (DescribeProviderProcess):
GET {WEAVER_URL}/providers/my-service/processes/my-process
Note
Process my-process
in the example is not registered locally. From the point of view of Weaver's processes (i.e.: route /processes/{id}
), it does NOT exist. You must absolutely use the provider-prefixed route /providers/{id}/processes/{id}
to explicitly fetch and resolve this remote process definition.
Warning
API requests scoped under Providers are Weaver-specific implementation. These are not part of _ specification.
Following steps represent the typical steps applied to deploy a process, execute it and retrieve the results.
Deployment of a new process is accomplished through the POST {WEAVER_URL}/processes
_ request.
The request body requires mainly two components:
processDescription
:
Defines the process identifier, metadata, inputs, outputs, and some execution specifications. This mostly corresponds to information that is provided by traditionalWPS
definition.executionUnit
:
Defines the core details of the Application Package. This corresponds to the explicitCWL
definition that indicates how to execute the given application.
Upon deploy request, Weaver will either respond with a successful result, or with the appropriate error message, whether caused by conflicting ID, invalid definitions or other parsing issues. A successful process deployment will result in this process to become available for following steps.
Warning
When a process is deployed, it is not necessarily available immediately. This is because process visibility also needs to be updated. The process must be made public to allow its discovery. Alternatively, the visibility can be directly provided within the body of the deploy request to skip this extra step. For specifying or updating visibility, please refer to corresponding _ and _ requests.
After deployment and visibility preconditions have been met, the corresponding process should become available through DescribeProcess <proc_op_describe>
requests and other routes that depend on an existing process.
Note that when a process is deployed using the WPS-REST interface, it also becomes available through the WPS-1/2 interface with the same identifier and definition. Because of compatibility limitations, some parameters in the WPS-1/2 side might not be perfectly mapped to the equivalent or adjusted WPS-REST interface, although this concerns mostly only new features such as Job
status monitoring. For most traditional use cases, properties are mapped between the two interfaces, but it is recommended to use the WPS-REST one because of the added features.
Please refer to application-package
chapter for any additional parameters that can be provided for specific types of Application Package
and Process
definitions.
Available processes can all be listed using _ request. This request will return all locally registered process summaries. Other return formats and filters are also available according to provided request query parameters. Note that processes not marked with public visibility will not be listed in this result.
For more specific process details, the _ request should be used. This will return all information that define the process references and expected inputs/outputs.
Note
For remote processes (see: Remote Provider), Provider requests are also available for more fine-grained search of underlying processes. These processes are not necessarily listed as local processes, and will therefore sometime not yield any result if using the typical DescribeProcess
request on wps_endpoint.
All routes listed under Process requests should normally be applicable for remote processes by prefixing them with /providers/{id}
.
4.20
With the addition of Process
revisions (see Update Operation <proc_op_update
below), a registered Process
specified only by {processID}
will retrieve the latest revision of that Process
. A specific older revision can be obtained by adding the tagged version in the path ({processID}:{version}
) or adding the request query parameter version
.
Using revisions provided through PUT
and PATCH
requests, it is also possible to list specific or all existing revisions of a given or multiple processes simultaneously using the revisions
and version
query parameters with the _ request.
Since Weaver supports _, it is able to remove a previously registered Process
using the Deployment <proc_op_deploy>
request. The undeploy operation consist of a DELETE
request targeting the specific {WEAVER_URL}/processes/{processID}
to be removed.
Note
The Process
must be accessible by the user considering any visibility configuration to perform this step. See proc_op_deploy
section for details.
4.20
Starting from version 4.20, a Process
can be replaced or updated using respectively the PUT
and PATCH
requests onto the specific {WEAVER_URL}/processes/{processID}
location of the reference to modify.
Note
The Process
partial update operation (using PATCH
) is specific to Weaver only. _ only mandates the definition of PUT
request for full override of a Process
.
When a Process
is modified using the PATCH
operation, only the new definitions need to be provided, and unspecified items are transferred over from the referenced Process
(i.e.: the previous revision). Using either the PUT
or PATCH
requests, previous revisions can be referenced using two formats:
{processID}:{version}
as request path parameters (instead of usual{processID}
only){processID}
in the request path combined with?version={version}
query parameter
Weaver employs MAJOR.MINOR.PATCH
semantic versioning to maintain revisions of updated or replaced Process
definitions. The next revision number to employ for update or replacement can either be provided explicitly in the request body using a version
, or be omitted. When omitted, the next revision will be guessed automatically based on the previous available revision according to the level of changes required. In either cases, the resolved version
will have to be available and respect the expected update level to be accepted as a new valid Process
revision. The applicable revision level depends on the contents being modified using submitted request body fields according to the following table. When a combination of the below items occur, the higher update level is required.
HTTP Method | Level | Change | Examples |
---|---|---|---|
PATCH |
PATCH |
Modifications to metadata not impacting the Process execution or definition. |
|
PATCH |
MINOR |
Modification that impacts how the Process could be executed, but not its definition. |
|
PUT |
MAJOR |
Modification that impacts what the Process executes. |
|
Note
For all applicable fields of updating a Process
, refer to the schema of _. For replacing a Process
, refer instead to the schema of _. The replacement request contents are extremely similar to the Deploy <proc_op_deploy>
schema since the full Process
definition must be provided.
For example, if the test-process:1.2.3
was previously deployed, and is the active latest revision of that Process
, submitting the below request body will produce a PATCH
revision as test-process:1.2.4
.
../examples/update-process-patch.http
Here, only metadata is adjusted and there is no risk to impact produced results or execution methods of the Process
. An external user would probably not even notice the Process
changed, which is why PATCH
is reasonable in this case. Notice that the version
is not explicitly provided in the body. It is guessed automatically from the modified contents. Also, the example displays how Process
-level and inputs/outputs-level metadata can be updated.
Similarly, the following request would produce a MINOR
revision of test-process
. Since both PATCH
and MINOR
level contents are defined for update, the higher MINOR
revision is required. In this case MINOR
is required because jobControlOptions
(forced to asynchronous execution for following versions) would break any future request made by users that would expect the Process
to run (or support) synchronous execution.
Notice that this time, the Process
reference does not indicate the revision in the path (no :1.2.4
part). This automatically resolves to the updated revision test-process:1.2.4
that became the new latest revision following our previous PATCH
request.
../examples/update-process-minor.http
In this case, the desired version
(1.4.0
) is also specified explicitly in the body. Since the updated number (MINOR = 4
) matches the expected update level from the above table and respects an higher level than the reference 1.2.4
Process
, this revision value will be accepted (instead of auto-resolved 1.3.0
otherwise). Note that if 2.4.0
was specified instead, the version would be refused, as Weaver does not consider this modification to be worth a MAJOR
revision, and tries to keep version levels consistent. Skipping numbers (i.e.: 1.3.0
in this case), is permitted as long as there are no other versions above of the same level (i.e.: 1.4.0
would be refused if 1.5.0
existed). This allows some level of flexibility with revisions in case users want to use specific numbering values that have more meaning to them. It is recommended to let Weaver auto-update version values between updates if this level of fined-grained control is not required.
Note
To avoid conflicting definitions, a Process
cannot be Deployed <proc_op_deploy>
directly using a {processID}:{version}
reference. Deployments are expected as the first revision and should only include the {processID}
portion as their identifier.
If the user desires a specific version to deploy, the PUT
request should be used with the appropriate version
within the request body. It is although up to the user to provide the full definition of that Process
, as PUT
request will completely replace the previous definition rather than transfer over previous updates (i.e: PATCH
requests).
Even when a Process
is "replaced" using PUT
, the older revision is not actually removed and undeployed (DELETE
request). It is therefore still possible to refer to the old revision using explicit references with the corresponding version
. Weaver keeps track of revisions by corresponding {processID}
entries such that if the latest revision is undeployed, the previous revision will automatically become the latest once again. For complete replacement, the user should instead perform a DELETE
of all existing revisions (to avoid conflicts) followed by a new Deploy <proc_op_deploy>
request.
Process
execution (i.e.: submitting a Job
) is accomplished using the _ request.
Note
For backward compatibility, the _ request is also supported as alias to the above OGC API - Processes
compliant endpoint.
This section will first describe the basics of this request format, and after go into details for specific use cases and parametrization of various input/output combinations. Let's employ the following example of JSON body sent to the Job
execution to better illustrate the requirements.
{
"mode": "async",
"response": "document",
"inputs": [
{
"id": "input-file",
"href": "<some-file-reference"
},
{
"id": "input-value",
"data": 1,
}
],
"outputs": [
{
"id": "output",
"transmissionMode": "reference"
}
]
} |
{
"mode": "async",
"response": "document",
"inputs": {
"input-file": {
"href": "<some-file-reference"
},
"input-value": {
"value": 1
}
},
"outputs": {
"output": {
"transmissionMode": "reference"
}
}
} |
Note
For backward compatibility, the execution payload inputs
and outputs
can be provided either as mapping (keys are the IDs, values are the content), or as listing (each item has content and "id"
field) interchangeably. When working with OGC API - Processes
compliant services, the mapping representation should be preferred as it is the official schema, is more compact, and it allows inline specification of literal data (values provided without the nested value
field). The listing representation is the older format employed during previous OGC
testbed developments.
Note
Other parameters can be added to the request to provide further functionalities. Above fields are the minimum requirements to request a Job
. Please refer to the OpenAPI Execute_ definition for all applicable features.
- proc_exec_body
and proc_exec_mode
details applicable for Weaver specifically. - OGC API - Processes, Process Outputs for more general details on transmissionMode
parameter. - OGC API - Processes, Execution Mode for more general details on the execution negotiation (formerly with mode
parameter) and more recently with Prefer
header. - _ and _ for a complete listing of available response
formats considering all other parameters.
4.20
With the addition of Process
revisions (see Update Operation <proc_op_update
section), a registered Process
specified only by {processID}
will execute the latest revision of that Process
. An older revision can be executed by adding the tagged version in the path ({processID}:{version}
) or adding the request query parameter version
.
The inputs
definition is the most important section of the request body. It is also the only one that is completely required when submitting the execution request, even for a no-input process (an empty mapping is needed in such case). It defines which parameters to forward to the referenced Process
to be executed. All id
elements in this Job
request body must correspond to valid inputs
from the definition returned by DescribeProcess <proc_op_describe>
response. Obviously, all formatting requirements (i.e.: proper file MIME-types
), data types (e.g.: int
, string
, etc.) and validations rules (e.g.: minOccurs
, AllowedValues
, etc.) must also be fulfilled. When providing files as input, multiple protocols are supported. See later section File Reference Types
for details.
The outputs
section defines, for each id
corresponding to the Process
definition, how to report the produced outputs from a successful Job
completion. For the time being, Weaver only implement the reference
result as this is the most common variation. In this case, the produced file is stored locally and exposed externally with returned reference URL. The other mode value
returns the contents directly in the response instead of the URL.
When outputs
section is omitted, it simply means that the Process
to be executed should return all outputs it offers in the created Job Results <proc_op_result>
. In such case, because no representation modes is specified for individual outputs, Weaver automatically selects reference
as it makes all outputs more easily accessible with distinct URL afterwards. If the outputs
section is specified, but that one of the outputs defined in the Process Description <proc_op_describe>
is not specified, that output should be omitted from the produced results. For the time being, because only reference
representation is offered for produced output files, this filtering is not implemented as it offers no additional advantage for files accessed directly with their distinct URLs. This could be added later if Multipart
raw data representation is required. Please _ to request this feature if it is relevant for your use-cases.
Filtering of outputs
not implemented (everything always available). #380
Other parameters presented in the above examples, namely mode
and response
are further detailed in the following proc_exec_mode
section.
In order to select how to execute a Process
, either synchronously or asynchronously, the Prefer
header should be specified. If omitted, Weaver defaults to asynchronous execution. To execute asynchronously explicitly, Prefer: respond-async
should be used. Otherwise, the synchronous execution can be requested with Prefer: wait=X
where X
is the duration in seconds to wait for a response. If no worker becomes available within that time, or if this value is greater than weaver.exec_sync_max_wait
, the Job
will resume asynchronously and the response will be returned. Furthermore, synchronous and asynchronous execution of a Process
can only be requested for corresponding jobControlOptions
it reports as supported in its Process Description <proc_op_describe>
. It is important to provide the jobControlOptions
parameter with applicable modes when Deploying a Process <proc_op_deploy>
to allow it to run as desired. By default, Weaver will assume that deployed processes are only asynchronous to handle longer operations.
By default, every proc_builtin
Process
can accept both modes. All previously deployed processes will only allow asynchronous execution, as only this one was supported. This should be reported in their jobControlOptions
.
Warning
It is important to remember that the Prefer
header is indeed a preference. If Weaver deems it cannot allocate a worker to execute the task synchronously within a reasonable delay, it can enforce the asynchronous execution. The asynchronous mode is also prioritized for running longer Job
submitted over the task queue, as this allows Weaver to offer better availability for all requests submitted by its users. The synchronous mode should be reserved only for very quick and relatively low computation intensive operations.
The mode
field displayed in the body is another method to tell whether to run the Process
in a blocking (sync
) or non-blocking (async
) manner. Note that support is limited for mode sync
as this use case is often more cumbersome than async
execution. Effectively, sync
mode requires to have a task worker executor available to run the Job
(otherwise it fails immediately due to lack of processing resource), and the requester must wait for the whole execution to complete to obtain the result. Given that Process
could take a very long time to complete, it is not practical to execute them in this manner and potentially have to wait hours to retrieve outputs. Instead, the preferred and default approach is to request an async
Job
execution. When doing so, Weaver will add this to a task queue for processing, and will immediately return a Job
identifier and Location
where the user can probe for its status, using Monitoring <proc_op_monitor>
request. As soon as any task worker becomes available, it will pick any leftover queued Job
to execute it.
Note
The mode
field is an older methodology that precedes the official OGC API - Processes
method using the Prefer
header. It is recommended to employ the Prefer
header that ensures higher interoperability with other services using the same standard. The mode
field is deprecated and preserved only for backward compatibility purpose.
When requesting a synchronous execution, and provided a worker was available to pick and complete the task before the maximum wait
time was reached, the final status will be directly returned. Therefore, the contents obtained this way will be identical to any following Job Status <proc_op_status>
request. If no worker is available, or if the worker that picked the Job
cannot complete it in time (either because it takes too long to execute or had to wait on resources for too long), the Job
execution will automatically switch to asynchronous mode.
The distinction between an asynchronous or synchronous response when executing a Job
can be observed in multiple ways. The easiest is with the HTTP status code of the response, 200 being for a Job
entirely completed synchronously, and 201 for a created Job
that should be monitored <proc_op_monitor>
asynchronously. Another method is to observe the "status"
value. Effectively, a Job
that is executed asynchronously will return status information contents, while a synchronous Job
will return the results directly, along a Location
header referring to the equivalent contents returned by GetStatus <proc_op_status>
as in the case of asynchronous Job
. It is also possible to extract the Preference-Applied
response header which will clearly indicate if the submitted Prefer
header was respected (because it could be with available worker resources) or not. In general, this means that if the Job
submission request was not provided with Prefer: wait=X
AND replied with the same Preference-Applied
value, it is safe to assume Weaver decided to queue the Job
for asynchronous execution. That Job
could be executed immediately, or at a later time, according to worker availability.
It is also possible that a failed
Job
, even when synchronous, will respond with equivalent contents to the status location instead of results. This is because it is impossible for Weaver to return the result(s) as outputs would not be generated by the incomplete Job
.
Finally, the response
parameter defines how to return the results produced by the Process
. When response=document
, regardless of mode=async
or mode=sync
, and regardless of requested outputs transmissionMode=value
or transmissionMode=reference
, the results will be returned in a JSON
format containing either literal values or URL references to produced files. If mode=async
, this results document is obtained with _ request, while mode=sync
returns it directly. When response=raw
, the specific contents (type and quantity), HTTP Link
headers or a mix of those components depends both on the number of available Process
outputs, which ones were requested, and how they were requested (i.e.: transmissionMode
). It is also possible that further content negotiation gets involved accordingly to the Accept
header and available Content-Type
of the outputs if multiple formats are supported by the Process
. For more details regarding those combination, the official _ and _ should be employed as reference.
For any of the previous combinations, it is always possible to obtain Job
outputs, along with logs, exceptions and other details using the proc_op_result
endpoints.
Once the Job
is submitted, its status should initially switch to accepted
. This effectively means that the Job
is pending execution (task queued), but is not yet executing. When a worker retrieves it for execution, the status will change to started
for preparation steps (i.e.: allocation resources, retrieving required parametrization details, etc.), followed by running
when effectively reaching the execution step of the underlying Application Package
operation. This status will remain as such until the operation completes, either with succeeded
or failed
status.
At any moment during asynchronous execution, the Job
status can be requested using _. Note that depending on the timing at which the user executes this request and the availability of task workers, it could be possible that the Job
be already in running
state, or even failed
in case of early problem detected.
When the Job
reaches its final state, multiple parameters will be adjusted in the status response to indicate its completion, notably the completed percentage, time it finished execution and full duration. At that moment, the requests for retrieving either error details or produced outputs become accessible. Examples are presented in Result <proc_op_result>
section.
detail 'operations' accomplished (stage-in, exec-cwl, stage-out)
For each proc_types
known by Weaver, specific Workflow
step implementations must be provided.
In order to simplify the chaining procedure of file references, step implementations are only required to provide the relevant methodology for their Deploy <proc_op_deploy>
, Execute <proc_op_execute>
, Monitor <proc_op_monitor>
and ref:Result <proc_op_result> operations. Operations related to staging of files, Process
preparation and cleanup are abstracted away from specific implementations to ensure consistent functionalities between each type.
Operations are accomplished in the following order for each individual step:
Step Method | Requirements | Description |
---|---|---|
prepare |
I* | Setup any prerequisites for the Process or Job . |
stage_inputs |
R | Retrieve input locations (considering remote files and Workflow previous-step staging). |
format_inputs |
I* | Perform operations on staged inputs to obtain desired format expected by the target Process . |
format_outputs |
I* | Perform operations on expected outputs to obtain desired format expected by the target Process . |
dispatch |
R,I | Perform request for remote execution of the Process . |
monitor |
R,I | Perform monitoring of the Job status until completion. |
get_results |
R,I | Perform operations to obtain results location in the expected format from the target Process . |
stage_results |
R | Retrieve results from remote Job for local storage using output locations. |
cleanup |
I* | Perform any final steps before completing the execution or after failed execution. |
Note
- All methods are defined within weaver.processes.wps_process_base.WpsProcessInterface
. - Steps marked by *
are optional. - Steps marked by R
are required. - Steps marked by I
are implementation dependant.
weaver.processes.wps_process_base.WpsProcessInterface.execute
for the implementation of operations order.
Most inputs can be categorized into two of the most commonly employed types, namely LiteralData
and ComplexData
. The former represents basic values such as integers or strings, while the other represents a file reference. Files in Weaver (and WPS
in general) can be specified with any formats
as _.
- cwl-wps-mapping
As for standard WPS
, remote file references are usually limited to http(s)
scheme, unless the process takes an input string and parses the unusual reference from the literal data to process it by itself. On the other hand, Weaver supports all following reference schemes.
http(s)://
file://
opensearchfile://
[experimental]s3://
[experimental]
The method in which Weaver will handle such references depends on its configuration, in other words, whether it is running as ADES
or EMS
(see: Configuration
), as well as depending on some other CWL
package requirements. These use-cases are described below.
Warning
Missing schemes in URL reference are considered identical as if file://
was used. In most cases, if not always, an execution request should not employ this scheme unless the file is ensured to be at the specific location where the running Weaver application can find it. This scheme is usually only employed as byproduct of the fetch operation that Weaver uses to provide the file locally to underlying CWL
application package to be executed.
When Weaver is able to figure out that the Process
needs to be executed locally in ADES
mode, it will fetch all necessary files prior to process execution in order to make them available to the CWL
package. When Weaver is in EMS
configuration, it will always forward remote references (regardless of scheme) exactly as provided as input of the process execution request, since it assumes it needs to dispatch the execution to another ADES
remote server, and therefore only needs to verify that the file reference is reachable remotely. In this case, it becomes the responsibility of this remote instance to handle the reference appropriately. This also avoids potential problems such as if Weaver as EMS
doesn't have authorized access to a link that only the target ADES
would have access to.
When CWL
package defines WPS1Requirement
under hints
for corresponding WPS-1/2 remote processes being monitored by Weaver, it will skip fetching of http(s)://
-based references since that would otherwise lead to useless double downloads (one on Weaver and the other on the WPS
side). It is the same in situation for ESGF-CWTRequirement
employed for ESGF-CWT processes. Because these processes do not always support S3
buckets, and because Weaver supports many variants of S3
reference formats, it will first fetch the S3
reference using its internal _, and then expose this downloaded file as http(s)://
reference accessible by the remote WPS
process.
Note
When Weaver is fetching remote files with http(s)://
, it can take advantage of additional Request Options
to support unusual or server-specific handling of remote reference as necessary. This could be employed for instance to attribute access permissions only to some given ADES
server by providing additional authorization tokens to the requests. Please refer to Configuration of Request Options
for this matter.
Note
An exception to above mentioned skipped fetching of http(s)://
files is when the corresponding Process
types are intermediate steps within a Workflow. In this case, local staging of remote results occurs between each step because Weaver cannot assume any of the remote Provider
is able to communicate with each other, according to potential Request Options
or Data Source
only configured for access by Weaver.
When using S3
references, Weaver will attempt to retrieve the file using server _ and _. Provided that the corresponding S3
bucket can be accessed by the running Weaver application, it will fetch the file and store it locally temporarily for CWL
execution.
Note
When using S3
buckets, authorization are handled through typical AWS
credentials and role permissions. This means that AWS
access must be granted to the application in order to allow it fetching the file. There are also different formats of S3
reference formats handled by Weaver. Please refer to Configuration of AWS S3 Buckets
for more details.
When using OpenSearch
references, additional parameters are necessary to handle retrieval of specific file URL. Please refer to OpenSearch Data Source
for more details.
Following table summarize the default behaviour of input file reference handling of different situations when received as input argument of process execution. For simplification, keyword <any> is used to indicate that any other value in the corresponding column can be substituted for a given row when applied with conditions of other columns, which results to same operational behaviour. Elements that behave similarly are also presented together in rows to reduce displayed combinations.
Configuration | Process Type | File Scheme | Applied Operation |
---|---|---|---|
<any> | <any> | opensearchfile:// |
Query and re-process1 |
|
|
|
|
EMS |
|
|
|
|
|
|
|
Footnotes
method to indicate explicit fetch to override these? (#183)
add tests that validate each combination of operation
When processing any of the previous file_reference_types
, the resulting name of the file after retrieval can depend on the applicable scheme. In most cases, the file name is simply the last fragment of the path, whether it is an URL, an S3
bucket or plainly a file directory path. The following cases are exceptions.
4.4.0 When using http(s)://
references, the Content-Disposition
header can be provided with filename
and/or filename*
as specified by 2183
, 5987
and 6266
specifications in order to define a staging file name. Note that Weaver takes this name only as a suggestion as will ignore the preferred name if it does not conform to basic naming conventions for security reasons. As a general rule of thumb, common alphanumeric characters and separators such as dash (-
), underscores (_
) or dots (.
) should be employed to limit chances of errors. If none of the suggested names are valid, Weaver falls back to the typical last fragment of the URL as file name.
When using s3://
references (or equivalent http(s)://
referring to S3
bucket), the staged file names will depend on the stored object names within the bucket. In that regard, naming conventions from AWS
should be respected.
- _ - _
When using vault://<UUID>
references, the resulting file name will be obtained from the filename
specified in the Content-Disposition
within the uploaded content of the multipart/form-data
request.
- vault_upload
Refer to vault_upload
section for general details about the Vault
feature.
Stored files in the Vault
can be employed as input for proc_op_execute
operation using the provided vault://<UUID>
reference from the response following upload. The Execute <proc_op_execute>
request must also include the X-Auth-Vault
header to obtain access to the file.
Warning
Avoid using the Vault
HTTP location as href
input. Prefer the vault://<UUID>
representation.
The direct Vault
HTTP location SHOULD NOT be employed as input reference to a Process
to ensure its proper interpretation during execution. There are two main reasons for this.
Firstly, using the plain HTTP endpoint will not provide any hint to Weaver about whether the input link is a generic remote file or one hosted in the Vault
. With the lack of this information, Weaver could attempt to download the file to retrieve it for its local Process
execution, creating unnecessary operations and wasting bandwidth since it is already available locally. Furthermore, the Vault
behaviour that deletes the file after its download would cause it to become unavailable upon subsequent access attempts, as it could be the case during handling and forwarding of references during intermediate Workflow
step operations. This could inadvertently break the Workflow
execution.
Secondly, without the explicit Vault
reference, Weaver cannot be aware of the necessary X-Auth-Vault
authorization needed to download it. Using the vault://<UUID>
not only tells Weaver that it must forward any relevant access token to obtain the file, but it also ensures that those tokens are not inadvertently sent to other locations. Effectively, because the Vault
can be used to temporarily host sensitive data for Process
execution, Weaver can better control and avoid leaking the access token to irrelevant resource locations such that only the intended Job
and specific input can access it. This is even more important in situations where multiple Vault
references are required, to make sure each input forwards the respective access token for retrieving its file.
When submitting the Execute <proc_op_execute>
request, it is important to provide the X-Auth-Vault
header with additional reference to the Vault
parameter when multiple files are involved. Each token should be provided using a comma to separated them, as detailed below. When only one file refers to the Vault
the parameters can be omitted since there is no need to map between tokens and distinct vault://<UUID>
entries.
../examples/vault-execute.http
The notation (5234
, 7230#section-1.2
) of the X-Auth-Vault
header is presented below.
X-Auth-Vault = vault-unique / vault-multi
vault-unique = credentials [ BWS ";" OWS auth-param ] vault-multi = credentials BWS ";" OWS auth-param 1*( "," OWS credentials BWS ";" OWS auth-param ) credentials = auth-scheme RWS access-token auth-scheme = "token" auth-param = "id" "=" vault-id vault-id = UUID / ( DQUOTE UUID DQUOTE ) access-token = base64 base64 = <base64, see 4648#section-4
> DQUOTE = <DQUOTE, see 7230#section-1.2
> UUID = <UUID, see 4122#section-3
> BWS = <BWS, see 7230#section-3.2.3
> OWS = <OWS, see 7230#section-3.2.3
> RWS = <RWS, see 7230#section-3.2.3
>
In summary, the access token can be provided by itself by omitting the Vault
UUID parameter only if a single file is referenced across all inputs within the Execute <proc_op_execute>
request. Otherwise, multiple Vault
references all require to specify both their respective access token and UUID in a comma separated list.
In order to provide OpenSearch
query results as input to Process
for execution, the corresponding Deploy <proc_op_deploy>
request body must be provided with additionalParameters
in order to indicate how to interpret any specified metadata. The appropriate OpenSearch
queries can then be applied prior the execution to retrieve the explicit file reference(s) of EOImage
elements that have been found and to be submitted to the Job
.
Depending on the desired context (application or per-input) over which the AOI
, TOI
, EOImage
and multiple other metadata search filters are to be applied, their definition can be provided in the following locations within the Deploy <proc_op_deploy>
body.
Context | Location | Role |
---|---|---|
Application | processDescription.process.additionalParameters |
http://www.opengis.net/eoc/applicationContext |
Input | processDescription.process.inputs[*].additionalParameters |
http://www.opengis.net/eoc/applicationContext/inputMetadata |
The distinction between application or per-input contexts is entirely dependent of whatever is the intended processing operation of the underlying Process
, which is why they must be defined by the user deploying the process since there is no way for Weaver to automatically infer how to employ provided search parameters.
In each case, the structure of additionalParameters
should be similar to the following definition:
{
"additionalParameters": [
{
"role": "http://www.opengis.net/eoc/applicationContext/inputMetadata",
"parameters": [
{
"name": "EOImage",
"values": [
"true"
]
},
{
"name": "AllowedCollections",
"values": "s2-collection-1,s2-collection-2,s2-sentinel2,s2-landsat8"
}
]
}
]
}
In each case, it is also expected that the role
should correspond to the location where the definition is provided accordingly to their context from the above table.
For each deployment, processes using EOImage
to be processed into OpenSearch
query results can interpret the following field definitions for mapping against respective inputs or application context.
Name | Values | Context | Description |
---|---|---|---|
EOImage |
["true"] |
Input | Indicates that the nested parameters within the current additionalParameters section where it is located defines an EOImage . This is to avoid misinterpretation by similar names that could be employed by other kind of definitions. The Process input's id where this parameter is defined is the name that will be employed to pass down OpenSearch results. |
AllowedCollections |
String of comma-separated list of collection IDs. | Input (same one as EOImage ) |
Provides a subset of collection identifiers that are supported. During execution any specified input not respecting one of the defined values will fail OpenSearch query resolution. |
CatalogSearchField |
["<name>"] |
Input (other one than EOImage ) |
String with the relevant OpenSearch query filter name according to the described input. Defines a given Process input id to be mapped against the specified query name. |
UniqueAOI |
["true"] |
Application | Indicates that provided CatalogSearchField (typically bbox ) corresponds to a global AOI that should be respected across multiple EOImage inputs. Otherwise, (default values: ["false"] ) each EOImage should be accompanied with its respective AOI definition. |
UniqueTOI |
["true"] |
Application | Indicates that provided CatalogSearchField (typically StartDate and EndDate ) corresponds to a global TOI that should be respected across multiple EOImage inputs. Otherwise, (default values: ["false"] ) each EOImage should be accompanied with its respective TOI definition. |
When an EOImage
is detected for a given Process
, any submitted Job
execution will expect the defined inputs in the Process
description to indicate which images to retrieve for the application. Using inputs defined with corresponding CatalogSearchField
filters, a specific OpenSearch
query will be sent to obtain the relevant images. The inputs corresponding to search fields will then be discarded following OpenSearch
resolution. The resolved link(s) for to EOImage
will be substituted within the id
of the input where EOImage
was specified and will be forwarded to the underlying Application Package
for execution.
Note
Collection identifiers are mapped against URL endpoints defined in configuration to execute the appropriate OpenSearch
requests. See conf_data_sources
for more details.
Definitions in _ request body provides a more detailed example of the expected structure and relevant additionalParameters
locations.
Definitions in _ providing different combinations of inputs, notably for using distinct AOI
, term:TOI and collections, with or without UniqueAOI
and UniqueTOI
specifiers.
repeating IDs example for WPS multi-inputs
- Multiple and Optional Values
Although CWL
allows output arrays, WPS
does not support it directly, as only single values are allowed for WPS
outputs according to original specification. To work around this, _ files can be used to provide a single output reference that embeds other references. This approach is also employed and preferred as described in _.
fix doc when Multiple Output is supported with metalink (#25)
add example of multi-output process definition
and how CWL maps them with WPS
Warning
This feature is being worked on (Weaver Issue #25). Direct support between
- Multiple and Optional Values
By default, Job
results will be hosted under the endpoint configured by weaver.wps_output_url
and weaver.wps_output_path
, and will be stored under directory defined by weaver.wps_output_dir
setting.
Warning
Hosting of results from the file system is NOT handled by Weaver itself. The API will only report the expected endpoints using configured weaver.wps_output_url
. It is up to an alternate service or the platform provider that serves the Weaver application to provide the external hosting and availability of files online as desired.
Each Job
will have its specific UUID employed for all of the outputs files, logs and status in order to avoid conflicts. Therefore, outputs will be available with the following location:
{WPS_OUTPUT_URL}/{JOB_UUID}.xml # status location
{WPS_OUTPUT_URL}/{JOB_UUID}.log # execution logs
{WPS_OUTPUT_URL}/{JOB_UUID}/{output.ext} # results of the job if successful
Note
Value WPS_OUTPUT_URL
in above example is resolved accordingly with weaver.wps_output_url
, weaver.wps_output_path
and weaver.url
, as per conf_settings
details.
When submitting a Job
for execution, it is possible to provide the X-WPS-Output-Context
header. This modifies the output location to be nested under the specified directory or sub-directories.
For example, providing X-WPS-Output-Context: project/test-1
will result in outputs located at:
{WPS_OUTPUT_URL}/project/test-1/{JOB_UUID}/{output.ext}
Note
Values provided by X-WPS-Output-Context
can only contain alphanumeric, hyphens, underscores and path separators that will result in a valid directory and URL locations. The path is assumed relative by design to be resolved under the WPS
output directory, and will therefore reject any .
or ..
path references. The path also CANNOT start by /
. In such cases, an HTTP error will be immediately raised indicating the symbols that where rejected when detected within X-WPS-Output-Context
header.
If desired, parameter weaver.wps_output_context
can also be defined in the conf_settings
in order to employ a default directory location nested under weaver.wps_output_dir
when X-WPS-Output-Context
header is omitted from the request. By default, this parameter is not defined (empty) in order to store Job
results directly under the configured WPS
output directory.
Note
Header X-WPS-Output-Context
is ignored when using S3 buckets for output location since they are stored individually per Job
UUID, and hold no relevant context location. See also conf_s3_buckets
.
4.3.0 Addition of the X-WPS-Output-Context
header.
When submitting a Job
for execution, it is possible to provide the notification_email
field. Doing so will tell Weaver to send an email to the specified address with successful or failure details upon Job
completion. The format of the email is configurable from weaver.ini.example file with email-specific settings (see: Configuration
).
Monitoring the execution of a Job
consists of polling the status Location
provided from the Execute
operation and verifying the indicated status
for the expected result. The status
can correspond to any of the value defined by weaver.status.JOB_STATUS_VALUES
accordingly to the internal state of the workers processing their execution.
When targeting a Job
submitted to a Weaver instance, monitoring is usually accomplished through the OGC API - Processes
endpoint using _, which will return a JSON
body. Alternatively, the XML
status location document returned by the wps_endpoint
could also be employed to monitor the execution.
In general, both endpoints should be interchangeable, using below mapping. The Job
monitoring process keeps both contents equivalent according to their standard. For convenience, requesting the Execute
with Accept: <content-type>
header corresponding to either JSON
or XML
should redirect to the response of the relevant endpoint, regardless of where the original request was submitted. Otherwise, the default contents format is employed according to the chosen location.
Standard | Contents | Location |
---|---|---|
OGC API - Processes |
JSON | {WEAVER_URL}/jobs/{JobUUID} |
WPS |
XML |
{WEAVER_WPS_OUTPUTS}/{JobUUID}.xml |
For the WPS
endpoint, refer to conf_settings
.
In the case of successful Job
execution, the outputs can be retrieved with _ request to list each corresponding output id
with the generated file reference URL. Keep in mind that the purpose of those URLs are only to fetch the results (not persistent storage), and could therefore be purged after some reasonable amount of time. The format should be similar to the following example, with minor variations according to Configuration
parameters for the base WPS
output location:
{
"outputs": [
{
"id": "output",
"href": "{WEAVER_URL}/wpsoutputs/f93a15be-6e16-11ea-b667-08002752172a/output_netcdf.nc"
}
]
}
For the OGC
compliant endpoint, the request can be employed instead. In the event of a Job
executed with response=document
, the contents will be very similar. On the other hand, a Job
submitted with response=raw
can produce many alternative variations according to OGC
requirements. For this reason, the outputs endpoint will always provide all data and file references in the response body as Job
, no matter the original response
format. The outputs endpoint can also receive additional query parameters, such as schema
, to return contents formatted similarly to results, but enforcing a JSON
body as if response=document
was specified during submission of the Process
execution.
In order to better understand the parameters that where submitted during Job
creation, the _ can be employed. This will return both the data and reference inputs that were submitted, as well as the requested outputs to retrieve any relevant transmissionMode
definition.
In situations where the Job
resulted into failed
status, the _ can be use to retrieve the potential cause of failure, by capturing any raised exception. Below is an example of such exception details.
[
"builtins.Exception: Could not read status document after 5 retries. Giving up."
]
The returned exception are often better understood when compared against, or in conjunction with, the logs that provide details over each step of the operation.
Any Job
executed by Weaver will provide minimal information log, such as operation setup, the moment when it started execution and latest status. The extent of other log entries will more often than not depend on the verbosity of the underlying process being executed. When executing an Application Package
, Weaver tries as best as possible to collect standard output and error steams to report them through log and exception lists.
Since Weaver can only report as much details as provided by the running application, it is recommended by Application Package
implementers to provide progressive status updates when developing their package in order to help understand problematic steps in event of process execution failures. In the case of remote WPS
processes monitored by Weaver for example, this means gradually reporting process status updates (e.g.: calling WPSResponse.update_status
if you are using _, see: _), while using print
and/or logging
operation for scripts or Docker
images executed through CWL
CommandLineTool
.
Note
Job
logs and exceptions are a Weaver-specific implementation. They are not part of traditional _.
A minimalistic example of logging output is presented below. This can be retrieved using _ request, at any moment during Job
execution (with logs up to that point in time) or after its completion (for full output). Note again that the more the Process
is verbose, the more tracking will be provided here.
../../weaver/wps_restapi/examples/job_logs.json
Note
All endpoints to retrieve any of the above information about a Job
can either be requested directly (i.e.: /jobs/{jobID}/...
) or with equivalent Provider
and/or Process
prefixed endpoints, if the requested Job
did refer to those Provider
and/or Process
. A local Process
would have its Job
references as /processes/{processId}/jobs/{jobID}/...
while a proc_remote_provider
will use /provider/{providerName}/processes/{processId}/jobs/{jobID}/...
.
The Vault
is available as secured storage for uploading files to be employed later for Process
execution (see also file_vault_inputs
).
Note
The Vault
is a specific feature of Weaver. Other ADES
, EMS
and OGC API - Processes
servers are not expected to provide this endpoint nor support the vault://<UUID>
reference format.
Refer to conf_vault
for applicable settings for this feature.
When upload succeeds, the response will return a Vault
UUID and an access_token
to access the file. Uploaded files cannot be accessed unless the proper credentials are provided. Requests toward the Vault
should therefore include a X-Auth-Vault: token {access_token]
header in combination to the provided Vault
UUID in the request path to retrieve the file contents. The upload response will also include a file_href
field formatted with a vault://<UUID>
reference to be used for file_vault_inputs
, as well as a Content-Location
header of the contextual Vault
endpoint for that file.
Download of the file is accomplished using the _ request. In order to either obtain the file metadata without downloading it, or simply to validate its existence, the _ request can be used. This HEAD request can be queried any number of times without affecting the file from the Vault
. For both HTTP methods, the X-Auth-Vault
header is required.
Note
The Vault
acts only as temporary file storage. For this reason, once the file has been downloaded, it is immediately deleted. Download can only occur once. It is assumed that the resource that must employ it will have created a local copy from the download and the Vault
doesn't require to preserve it anymore. This behaviour intends to limit the duration for which potentially sensitive data remains available in the Vault
as well as performing cleanup to limit storage space.
Using the Weaver CLI or Python client <cli>
, it is possible to upload local files automatically to the Vault
of a remote Weaver server. This can help users host their local file for remote Process
execution. By default, the cli
will automatically convert any local file path provided as execution input into a vault://<UUID>
reference to make use of the Vault
self-hosting from the target Weaver instance. It will also update the provided inputs or execution body to apply any transformed vault://<UUID>
references transparently. This will allow the executed Process
to securely retrieve the files using file_vault_inputs
behaviour. Transmission of any required authorization headers is also handled automatically when using this approach.
It is also possible to manually provide vault://<UUID>
references or endpoints if those were uploaded beforehand using the upload
operation, but the user must also generate the X-Auth-Vault
header manually in such case.
Section file_vault_inputs
provides more details about the format of X-Auth-Vault
for submission of multiple inputs.
In order to manually upload files, the below code snippet can be employed.
../examples/vault_upload.py
This should automatically generate a similar request to the result below.
../examples/vault-upload.http
Warning
When providing literal HTTP request contents as above, make sure to employ CRLF
instead of plain LF
for separating the data using the boundary. Also, make sure to omit any additional LF
between the data and each boundary if this could impact parsing of the data itself (e.g.: as in the case of non-text readable base64 data) to avoid modifying the file contents during upload. Some additional newlines are presented in the above example only for readability purpose. It is recommended to use utilities like the Python example or the Weaver CLI <cli>
so avoid such issues during request content generation. Please refer to 7578#section-4.1
for more details regarding multipart content separators.
Note that the Content-Type
embedded within the multipart content in the above example (not to be confused with the actual Content-Type
header of the request for uploading the file) can be important if the destination input of the Process
that will consume that Vault
file for execution must provide a specific choice of Media-Type if multiple are supported. This value could be employed to generate the explicit format
portion of the input, in case it cannot be resolved automatically from the file contents, or unless it is explicitly provided once again for that input within the Execute <proc_op_execute>
request body.
This endpoint is available if weaver.wps
setting was enabled (true
by default). The specific location where WPS
requests it will be accessible depends on the resolution of relevant conf_settings
, namely weaver.wps_path
and weaver.wps_url
.
Details regarding contents for each request is provided in schemas under _.
Note
Using the WPS
endpoint allows fewer control over functionalities than the corresponding OGC API - Processes
(WPS-REST
) endpoints since it is the preceding standard.
This section highlight the additional behaviour available only through an EMS
-configured Weaver instance. Some other points are already described in other sections, but are briefly indicated here for conciseness.
When using either the EMS
or HYBRID
17 configurations, Process
executions are dispatched to the relevant ADES
or another HYBRID
server supporting _ when inputs are matched against one of the configured Data Source
. Minimal implementations of OGC API - Processes
can also work as external Provider
where to dispatch executions, but in the case of core implementations, the Process
should be already available since it cannot be deployed.
In more details, when an _ request is received, Weaver will analyse any file references in the specified inputs and try to match them against specified Data Source
configuration. When a match is found and that the corresponding file_reference_types
indicates that the reference is located remotely in a known Data Source
provider that should take care of its processing, Weaver will attempt to _ the targeted Process
(and the underlying Application Package
) followed by its remote execution. It will then monitor the Job
until completion and retrieve results if the full operation was successful.
The Data Source
configuration therefore indicates to Weaver how to map a given data reference to a specific instance or server where that data is expected to reside. This procedure effectively allows Weaver to deliver applications close to the data which can be extremely more efficient (both in terms of time and quantity) than pulling the data locally when Data Source
become substantial. Furthermore, it allows Data Source
providers to define custom or private data retrieval mechanisms, where data cannot be exposed or offered externally, but are still available for use when requested.
Footnotes
Specific details about configuration of Data Source
are provided in the conf_data_sources
section.
Details regarding opensearch_data_source
are also relevant when resolving possible matches of Data Source
provider when the applicable file_reference_types
are detected.
add details, explanation done in below reference
CWL Workflow
proc_workflow_ops
Workflow
process type
References defined by
opensearchfile://
will trigger anOpenSearch
query using the provided URL as well as other input additional parameters (seeOpenSearch Data Source
). After processing of this query, retrieved file references will be re-processed using the summarized logic in the table for the given use case.↩When the process refers to a remote
WPS-REST
process (i.e.: remoteWPS
instance that supports REST bindings but that is not necessarily anADES
), Weaver simply wraps and monitors its remote execution, therefore files are handled just as for any other type of remoteWPS
-like servers. When the process contains an actualCWL
Application Package
that defines aCommandLineTool
class (including applications withDocker
image requirement), files are fetched as it will be executed locally. SeeCWL CommandLineTool
,WPS-REST
andRemote Provider
for further details.↩When the process refers to a remote
WPS-REST
process (i.e.: remoteWPS
instance that supports REST bindings but that is not necessarily anADES
), Weaver simply wraps and monitors its remote execution, therefore files are handled just as for any other type of remoteWPS
-like servers. When the process contains an actualCWL
Application Package
that defines aCommandLineTool
class (including applications withDocker
image requirement), files are fetched as it will be executed locally. SeeCWL CommandLineTool
,WPS-REST
andRemote Provider
for further details.↩When a
file://
(or empty scheme) maps to a local file that needs to be exposed externally for another remote process, the conversion tohttp(s)://
scheme employs settingweaver.wps_output_url
to form the result URL reference. The file is placed inweaver.wps_output_dir
to expose it as HTTP(S) endpoint. Note that the HTTP(S) servicing of the file is not handled by Weaver itself. It is assumed that the server where Weaver is hosted or another service takes care of this task.↩When an
s3://
file is fetched, is gets downloaded to a temporaryfile://
location, which is NOT necessarily exposed ashttp(s)://
. If execution is transferred to a remove process that is expected to not supportS3
references, only then the file gets converted as in.↩When a
vault://<UUID>
file is specified, the remote process needs to access it using the hostedVault
endpoint. Therefore, Weaver converts any vault reference to the corresponding location and inserts the access token in the requests headers to authorize download from the remote server. Seefile_vault_inputs
andvault_upload
for more details.↩Workflows are only available on
EMS
andHYBRID
instances. Since they chain processes, no fetch is needed as the sub-step process will do it instead as needed. SeeWorkflow
process as well asCWL Workflow
for more details.↩When a
file://
(or empty scheme) maps to a local file that needs to be exposed externally for another remote process, the conversion tohttp(s)://
scheme employs settingweaver.wps_output_url
to form the result URL reference. The file is placed inweaver.wps_output_dir
to expose it as HTTP(S) endpoint. Note that the HTTP(S) servicing of the file is not handled by Weaver itself. It is assumed that the server where Weaver is hosted or another service takes care of this task.↩When the process refers to a remote
WPS-REST
process (i.e.: remoteWPS
instance that supports REST bindings but that is not necessarily anADES
), Weaver simply wraps and monitors its remote execution, therefore files are handled just as for any other type of remoteWPS
-like servers. When the process contains an actualCWL
Application Package
that defines aCommandLineTool
class (including applications withDocker
image requirement), files are fetched as it will be executed locally. SeeCWL CommandLineTool
,WPS-REST
andRemote Provider
for further details.↩When the process refers to a remote
WPS-REST
process (i.e.: remoteWPS
instance that supports REST bindings but that is not necessarily anADES
), Weaver simply wraps and monitors its remote execution, therefore files are handled just as for any other type of remoteWPS
-like servers. When the process contains an actualCWL
Application Package
that defines aCommandLineTool
class (including applications withDocker
image requirement), files are fetched as it will be executed locally. SeeCWL CommandLineTool
,WPS-REST
andRemote Provider
for further details.↩Workflows are only available on
EMS
andHYBRID
instances. Since they chain processes, no fetch is needed as the sub-step process will do it instead as needed. SeeWorkflow
process as well asCWL Workflow
for more details.↩When a
file://
(or empty scheme) maps to a local file that needs to be exposed externally for another remote process, the conversion tohttp(s)://
scheme employs settingweaver.wps_output_url
to form the result URL reference. The file is placed inweaver.wps_output_dir
to expose it as HTTP(S) endpoint. Note that the HTTP(S) servicing of the file is not handled by Weaver itself. It is assumed that the server where Weaver is hosted or another service takes care of this task.↩When an
s3://
file is fetched, is gets downloaded to a temporaryfile://
location, which is NOT necessarily exposed ashttp(s)://
. If execution is transferred to a remove process that is expected to not supportS3
references, only then the file gets converted as in.↩When a
vault://<UUID>
file is specified, the remote process needs to access it using the hostedVault
endpoint. Therefore, Weaver converts any vault reference to the corresponding location and inserts the access token in the requests headers to authorize download from the remote server. Seefile_vault_inputs
andvault_upload
for more details.↩When a
vault://<UUID>
file is specified, the localWPS-REST
process can make use of it directly. The file is therefore retrieved from theVault
using the provided UUID and access token to be passed to the application. Seefile_vault_inputs
andvault_upload
for more details.↩When a
file://
(or empty scheme) maps to a local file that needs to be exposed externally for another remote process, the conversion tohttp(s)://
scheme employs settingweaver.wps_output_url
to form the result URL reference. The file is placed inweaver.wps_output_dir
to expose it as HTTP(S) endpoint. Note that the HTTP(S) servicing of the file is not handled by Weaver itself. It is assumed that the server where Weaver is hosted or another service takes care of this task.↩Configuration
HYBRID
applies here in cases where Weaver acts as anEMS
for remote dispatch ofProcess
execution based on applicablefile_reference_types
.↩