# Weaver Process Management Demonstration

Weaver is primarily an Execution Management Service (EMS) that allows the execution of workflows chaining various applications and Web Processing Services inputs and outputs. Remote execution of each process in a workflow chain is dispatched by the EMS to one or many registered Application, Deployment and Execution Service (ADES) by ensuring the transfer of files accordingly between instances when located across multiple remote locations.<br/><br/>
[Weaver Documentation](https://pavics-weaver.readthedocs.io/en/latest/)  <br/>

Weaver is based on the concept of process to be executed.
These processes are based on [Comon Workflow Language(CWL)](https://www.commonwl.org/) in this case of demonstration.
Here you will find all the steps to:
1. Build the process
2. Deploy the process on weaver
3. Execute the process on weaver 
4. Monitor the status of a running process
5. Access the result of an executed process

# 1. Build a process
To build your own process, we will need some things: 

1. The source code for the function, module or application to be executed
2. A dockerized environment capable of executing the source code
3. A CWL document describing your process
4. A definition of the inputs required for process execution

### 1.1 The code to be executed as a process

The module to be used in this case will be a wrapper arround the [SentielSat](https://sentinelsat.readthedocs.io/en/stable/) client.
Essentialy, all it does is take the arguments passed to the python executable and run SentinelSat execution with the arguments, but we converted it to download Geo Spatial Sentinel images.


<br/>
<b>Here is an exemple of execution using python</b>

```shell
python sentinelsat_wrapper.py -d --credentials <credentials_file_path> --name <image_name> 
```

### 1.2 Dockerize our executable code

For this step, we need to create a Dockerfile, which will encapsulate all the requirements and dependencies for our python code to be executed. Then we will pass the arguments needed for the code through our Docker container script. For this dockerization part to work, we will have build a Docker image converting our python executable to a docker container exectuable. </br>
[Docker documentation for python developpement](https://docs.docker.com/language/python/)

In this case we name our docker image of sentinesat_wrapper : <u>`daccs-eo-sentinelsat` </u>

<b> Execution example </b></br>  
```shell
docker run daccs-eo-sentinelsat -d --credentials <credentials_file_path> --name <image_name> 
```

As we can see the only thing that changed is that we now execute our code trought docker which contains all the needs for our code to work.
</br></br>

<b> GitLab Container Registry </b>
</br>

In our case, after the docker build of the image, the docker container image is then pushed on our [Gitlab Registry](https://docs.gitlab.com/ee/user/packages/container_registry/) so that the image can be pulled as it would from the Docker Hub for example. 
The name of the image that we will use in the next steps will now becomes the url pointing to the registry holding our docker container image: 
<u>`registry.gitlab.com/crim.ca/public/daccs/daccs/daccs-eo-sentinelsat` </u>

</br>
The execution command becomes:

```shell 
docker run registry.gitlab.com/crim.ca/public/daccs/daccs/daccs-eo-sentinelsat -d --credentials <credentials_file_path> --name <image_name> 
```

### 1.3 CWL document

We use the application definition configuration provided by [Common Workflow Language](https://www.commonwl.org/user_guide/#common-workflow-language-user-guide) to provide a description of a process to be executed by Weaver. 

A CWL document serves as an interface between our module and all it needs to be executed by Weaver. 


In this CWL document, there are 3 main part:
1. Docker requirements
    - This is the image name or the url of the image to be pulled from a registry by Weaver
2. Inputs
    - These are the arguments to be passed to our docker to python code 
    - In our case : ```-d --credentials <credentials_file_path> --name <image_name>```
3. Ouputs
    - In this case the output is a downloaded file, so we are just going to check if the file exist

All set inside a CWL yaml file : <b> sentinelsat-application.yml </b>


```yaml

#!/usr/bin/env cwl-runner

class: CommandLineTool
cwlVersion: v1.0

label: Application to download a Sentinel product.

# === DOCKER REQUIREMENT ===
# The docker image to be pulled and executed as a process
requirements:
  DockerRequirement:
    dockerPull: registry.gitlab.com/crim.ca/public/daccs/daccs/daccs-eo-sentinelsat:0.2.0 

# === INPUTS ===
# Default arguments
arguments: [ "-d" ]
# Inputs needed for our process
inputs:
  credentials: 
    type: File
    inputBinding:
      position: 1
      prefix: --credentials
  image:
    type: string
    inputBinding:
      prefix: --name
      position: 2

# === OUTPUTS ===
# Look for a downloaded .zip file in the runtime directory.
# That way we assume the image has been downloaded.
outputs:
  output:
    type: File[]
    outputBinding:
      glob: $(runtime.outdir)/*.zip 
      
```

### 1.4. A definition of the inputs required for process execution

This is a JSON like object that should contain all the inputs and information for the process to be executed.

In this case it should look something like this :
```json
{
  "inputs": [
    {
      "id": "credentials",
      "href": f'{credentials_file_path}',
      "type": 'application/json',
    },
    {
      "id": "image",
      "data": product,
    }
  ]
}
```

***
# Weaver 
Now that we have all the components needed for Weaver to execute our code/process as a job.

#### Requirements
- Weaver >=4.30.0 
- Installed the weaver client : https://pavics-weaver.readthedocs.io/en/latest/installation.html
    - After the installation make sure this notebook has access to the weaver client conda environnement. 

### Weaver Definitions

In [2]:
from weaver.cli import WeaverClient
from requests_magpie import MagpieAuth

from pprint import pprint
import requests
import json

In [None]:
# Weaver setup
WEAVER_URL = os.getenv('WEAVER_URL')

# Authentication 
MAGPIE_URL = os.getenv('MAGPIE_URL')
MAGPIE_USER = os.getenv('MAGPIE_USER')
MAGPIE_PASS = os.getenv('MAGPIE_PASS)

# Magpie Authentication
aCMagpieAuth = MagpieAuth(MAGPIE_URL, MAGPIE_USER, MAGPIE_PASS)

# Instanciate a weaver client with magpie authentication
wclient = WeaverClient(url=WEAVER_URL, auth=aCMagpieAuth)

### Weaver Capabilities
This is a way for us to know which process is available on the machine pointed by the weaver client.

In [4]:
def print_available_processes():
    print('=== Available Processes ===')
    response = wclient.capabilities()
    processes = response.body['processes']
    pprint(processes)

In [62]:
print_available_processes()

=== Available Processes ===
['file_index_selector',
 'file2string_array',
 'jsonarray2netcdf',
 'metalink2netcdf']


## 2. Deploy the process on Weaver
[Deploy Documentation](https://pavics-weaver.readthedocs.io/en/latest/autoapi/weaver/cli/index.html#weaver.cli.WeaverClient.deploy)</br>

This is the part where we send to Weaver our CWL process. In our case this would be the `sentinelsat_download_process.yml` file. 

But for educational purposes in this notebook we will use python to send it as a json like object.

In [104]:
sentinelsat_download_process_CWL = {
  "class": "CommandLineTool",
  "cwlVersion": "v1.0",
  "label": "Application to download a Sentinel product.",
  "requirements": {
    "DockerRequirement": {
      "dockerPull": "registry.gitlab.com/crim.ca/public/daccs/daccs/daccs-eo-sentinelsat:0.2.0"
    }
  },
  "arguments": [
    "-d"
  ],
  "inputs": {
    "credentials": {
      "type": "File",
      "inputBinding": {
        "position": 1,
        "prefix": "--credentials"
      }
    },
    "image": {
      "type": "string",
      "inputBinding": {
        "prefix": "--name",
        "position": 2
      }
    }
  },
  "outputs": {
    "output": {
      "type": "File[]",
      "outputBinding": {
        "glob": "$(runtime.outdir)/*.zip"
      }
    }
  }
}

In [144]:
deploy_response = wclient.deploy(process_id='sentinelsat_download', cwl=sentinelsat_download_process_CWL)

In [106]:
print_available_processes()

=== Available Processes ===
['file_index_selector',
 'file2string_array',
 'jsonarray2netcdf',
 'metalink2netcdf',
 'sentinelsat_download']


## 3. Execute the process on weaver 
### Execution definition
[Execute Documentation](https://pavics-weaver.readthedocs.io/en/latest/autoapi/weaver/cli/index.html#weaver.cli.WeaverClient.execute) </br>
For the execution, Weaver expect a payload containing the inputs.
Because our process is a downloader of Spatial sentinel images we will need two inputs :
1. <b> The credentials file</b> 

2. <b>The name of the image to be downloaded</b>

#### 1. Credential file 
Here we have to create a temporary file containing the copernicus credentials to be sent as input to our download process.

In [3]:
# Saving a temporary credentials file to be uplaoded to the weaver vault
import tempfile
from tempfile import NamedTemporaryFile
from getpass import getpass


# Getting credentials for sentinelsat process
print(" Enter your copernicus credentials here :")
credentials = {
    "username": input("\tUsername: "),
    "password": getpass("\tPassword: ")
}


# Save credentials as file 
credentials_file = NamedTemporaryFile("w",
                                       encoding="UTF-8",
                                       suffix=".json",
                                       dir="./demo/")

json.dump(credentials, credentials_file)
credentials_file.flush()

credentials_file_path = credentials_file.name

 Enter your copernicus credentials here :
	Username: <USER_NAME>
	Password: ········


#### 2.  Image name  

In [131]:
# Specify the image product name for the process to download
product = 'S1A_IW_GRDH_1SDV_20230730T223634_20230730T223650_049659_05F8B3_4FE8'

### Process Payload

In [4]:
# process payload containing the input
sentinelsat_execute_payload = {
  "inputs": [
    {
      "id": "credentials",
      "href": f'{credentials_file_path}',
      "type": 'application/json',
    },
    {
      "id": "image",
      "data": product,
    }
  ]
}

####  Weaver Vault
- [Uploading File to the Vault](https://pavics-weaver.readthedocs.io/en/latest/processes.html#uploading-file-to-the-vault) 
- [File Vault Inputs](https://pavics-weaver.readthedocs.io/en/latest/processes.html#file-vault-inputs)
</br>

Because credentials have to be secret and not be logged for security purposes, we will use the [Weaver Vault](https://pavics-weaver.readthedocs.io/en/latest/appendix.html#term-Vault).<br>
Actualy Weaver will do it on its own. 

Essentialy what happens is that the path : `<path/to/tmp_credentials.json>` passed as `--credentials` input is converted to a `vault://<UUID>` representation for the execution. 

Weaver understands that file has to be pushed to its Vault, so it does. It hashes and upload the file into its Vault then it allows a single read before deleting it. The file can be open and read because Weaver use a token system where it returns the token as key to the vault to the job being executed with the vault UUID reference.

### Executing the process
This is where we ask Weaver to execute the <b>Sentinelsat_download_process.yml</b> by giving it the payload containing the inputs conrresponding to the CWL process.

In [143]:
# Executing the download process
execute_response = wclient.execute(process_id="sentinelsat_download", inputs=sentinelsat_execute_payload)

In [134]:
pprint(execute_response.body)

{'description': 'Job successfully submitted to processing queue. Execution '
                'should begin when resources are available.',
 'jobID': 'd0356ec3-0da7-4d7d-9caf-12159999947d',
 'location': 'https://pavics.ouranos.ca/weaver/processes/sentinelsat_download/jobs/d0356ec3-0da7-4d7d-9caf-12159999947d',
 'processID': 'sentinelsat_download',
 'status': 'accepted'}


In [135]:
# This will delete de temporary credentials file
credentials_file.close()

#### Validation
Let's validate the inputs passed to our execution.

In [136]:
inputs_response = requests.get(execute_response.body['location']+"/inputs")

In [137]:
pprint(inputs_response.json()['inputs'])

{'credentials': {'$schema': 'https://schemas.opengis.net/ogcapi/processes/part1/1.0/openapi/schemas/link.yaml',
                 'href': 'vault://de0641b5-5689-4814-803f-5d8f364284e4',
                 'type': 'text/plain'},
 'image': 'S1A_IW_GRDH_1SDV_20230730T223634_20230730T223650_049659_05F8B3_4FE8'}


## 4. Monitor the status of a running process
[Status Documentation](https://pavics-weaver.readthedocs.io/en/latest/autoapi/weaver/cli/index.html#weaver.cli.WeaverClient.status) <br>
Weaver allow the user to follow the process execution.

In [74]:
jobID = execute_response.body['jobID']
print("jobID :", jobID)

jobID : ebe9042a-2137-494a-8529-943420e8b4c1


In [75]:
status_response = wclient.status(job_reference=jobID)

In [141]:
# pprint(status_response)

## 5. Access the result of an executed process
[Results Documentation](https://pavics-weaver.readthedocs.io/en/latest/autoapi/weaver/cli/index.html#weaver.cli.WeaverClient.results) <br>
Showing us the result of our process. This command also offer to user the possibility to download the output file resulting of the process by passing ```out_dir``` as argument to the ```results()``` function

In [89]:
result_response = wclient.results(job_reference=jobID)

In [140]:
result_response

OperationResult(success=True, code=200, message="Listing job results.")
{
  "output": {
    "href": "https://pavics.ouranos.ca/wpsoutputs/ebe9042a-2137-494a-8529-943420e8b4c1/output/S1A_IW_GRDH_1SDV_20230730T223634_20230730T223650_049659_05F8B3_4FE8.zip",
    "type": "text/plain",
    "format": {
      "mediaType": "text/plain"
    }
  }
}

In [95]:
# The downloaded product on the machine can be found here 
print("Path to the downloaded product:",result_response.body['output']['href'])

Path to the downloaded product: https://pavics.ouranos.ca/wpsoutputs/ebe9042a-2137-494a-8529-943420e8b4c1/output/S1A_IW_GRDH_1SDV_20230730T223634_20230730T223650_049659_05F8B3_4FE8.zip


## Undeploy
[Undeploy Documentation](https://pavics-weaver.readthedocs.io/en/latest/autoapi/weaver/cli/index.html#weaver.cli.WeaverClient.undeploy)</br>

This method is used in case you need to undeploy the process for. </br>
Simply specify the `process_id`.

In [96]:
wclient.undeploy(process_id="sentinelsat_download")

OperationResult(success=True, code=200, message="Process successfully undeployed.")
{
  "description": "Process successfully undeployed.",
  "identifier": "sentinelsat_download",
  "undeploymentDone": true
}

In [97]:
print_available_processes()

=== Available Processes ===
['file_index_selector',
 'file2string_array',
 'jsonarray2netcdf',
 'metalink2netcdf']
