Test a workflow from the Workflow Hub
================================

This is an example of interaction with the WorkflowHub API. It also calls a script that shows an example of interaction with the test metadata to run tests on a minimal workflow ro-crate.

Before running this notebook, bring up a Workflow Hub instance:

`docker run -d -p 3000:3000 --name wfhub fairdom/seek:workflow`

Then use the web interface to create a first user (which will also be the admin). This notebook assumes:

* username: simleo
* password: 0123456789

The next step is to enable the workflows feature, which is disabled by default. In the web interface, click the top-right menu, then "Server admin", then "Enable/disable features", then scroll down to "SEEK features" and click "Workflows enabled". You can set https://view.commonwl.org/ as CWL Viewer URL. Scroll all the way down and click "Update" to apply the changes.

Minimal Workflow RO-Crate examples are available in `data/crates`. To be imported in the Workflow Hub, they need to be zipped with contents at the top level and have a `.crate.zip` extension. Run `bash data/zip_crates.sh` to create the zipped crates.

Now use the web interface to upload the `ro-crate-cwl-basefreqsum.crate.zip` workflow to the Workflow Hub (in the top-left menu, click "Create" and then "Workflow", then choose the "Advanced (Workflow RO Crate)" option). The title (`basefreqsum`) should be auto-detected by the importer. Associate the workflow to the Default Project (this should be the default option).

This notebook uses the JSON API to retrieve the workflow from the Workflow Hub instance. It then downloads the workflow's RO-Crate, unpacks it and runs `check_cwl.py` on it (Planemo must be installed: `pip install planemo`).

JSON API examples are available at https://github.com/seek4science/seekAPIexamples. The Workflow Hub API docs are available at https://workflowhub.eu/api.

In [1]:
import os
import pprint
import requests
import shutil
import tempfile
import zipfile
from urllib.request import urlretrieve

In [2]:
pp = pprint.PrettyPrinter(indent=4)
base_url = 'http://localhost:3000'
headers = {
    "Content-type": "application/vnd.api+json",
    "Accept": "application/vnd.api+json",
    "Accept-Charset": "ISO-8859-1"
}
session = requests.Session()
session.headers.update(headers)
session.auth = "simleo", "0123456789"

In [3]:
r = session.get(base_url + "/workflows")
r.raise_for_status()
data = r.json()["data"]
print("ALL WORKFLOWS:")
pp.pprint(data)

ALL WORKFLOWS:
[   {   'attributes': {'title': 'basefreqsum'},
        'id': '1',
        'links': {'self': '/workflows/1'},
        'type': 'workflows'}]


In [4]:
wf_id = [_["id"] for _ in data if _["attributes"]["title"] == "basefreqsum"][0]
print("wf_id =", wf_id)

wf_id = 1


In [5]:
r = session.get(base_url + "/workflows/%s" % wf_id)
r.raise_for_status()
data = r.json()["data"]
print("WORKFLOW %s:" % wf_id)
pp.pprint(r.json())
print()

WORKFLOW 1:
{   'data': {   'attributes': {   'content_blobs': [   {   'content_type': 'application/zip',
                                                           'link': 'http://localhost:3000/workflows/1/content_blobs/1',
                                                           'md5sum': 'fe1412054610ee65d2493ea72e6f61dc',
                                                           'original_filename': 'ro-crate-cwl-basefreqsum.crate.zip',
                                                           'sha1sum': 'e2fc46ae9411f7aa6a25b2cb19d0a4c0dd0b0b58',
                                                           'size': 9256,
                                                           'url': None}],
                                  'created_at': '2020-09-15T07:54:33.511Z',
                                  'description': 'compute base frequencies in '
                                                 'a FASTA file',
                                  'discussion_links': None,
         

In [6]:
workflow_class = data["attributes"]["workflow_class"]["key"]
print("WORKFLOW CLASS: %s" % workflow_class)
blob = data["attributes"]["content_blobs"][0]
original_filename = blob["original_filename"]
r = session.get(blob["link"])  # this is a full URL
r.raise_for_status()
data = r.json()["data"]
print("CONTENT BLOB FOR %s:" % original_filename)
pp.pprint(r.json())
print()

WORKFLOW CLASS: CWL
CONTENT BLOB FOR ro-crate-cwl-basefreqsum.crate.zip:
{   'data': {   'attributes': {   'content_type': 'application/zip',
                                  'md5sum': 'fe1412054610ee65d2493ea72e6f61dc',
                                  'original_filename': 'ro-crate-cwl-basefreqsum.crate.zip',
                                  'sha1sum': 'e2fc46ae9411f7aa6a25b2cb19d0a4c0dd0b0b58',
                                  'size': 9256,
                                  'url': None},
                'id': '1',
                'links': {   'download': '/workflows/1/content_blobs/1/download',
                             'self': '/workflows/1/content_blobs/1'},
                'meta': {   'api_version': '0.3',
                            'base_url': 'http://localhost:3000',
                            'created': '2020-09-15T07:54:20.765Z',
                            'modified': '2020-09-15T07:54:33.487Z',
                            'uuid': '9648fe00-d956-0138-db2b-0242ac1100

In [7]:
wd = tempfile.mkdtemp(prefix="ro_crate_test_")
crate_zip_path = os.path.join(wd, original_filename)
file_url = base_url + data["links"]["download"]
print("downloading workflow RO-Crate to %s" % crate_zip_path)
urlretrieve(file_url, crate_zip_path)

downloading workflow RO-Crate to /tmp/ro_crate_test_d4elkloo/ro-crate-cwl-basefreqsum.crate.zip


('/tmp/ro_crate_test_d4elkloo/ro-crate-cwl-basefreqsum.crate.zip',
 <http.client.HTTPMessage at 0x7fc5c845add8>)

In [8]:
crate_dir_bn = os.path.basename(crate_zip_path).split(".", 1)[0]
crate_dir = os.path.join(wd, crate_dir_bn)
with zipfile.ZipFile(crate_zip_path, "r") as zipf:
    zipf.extractall(crate_dir)

In [9]:
import sys
import os
sys.path.append(os.path.dirname(os.path.abspath(os.getcwd())))
from check_cwl import main

class Args():
    pass

args = Args()
args.crate_dir = crate_dir
main(args)

RUNNING test1
test1: OK
RUNNING test2
test2: OK
