Get a workflow from the Workflow Hub
================================

Before running this notebook, bring up a Workflow Hub instance:

`docker run -d -p 3000:3000 --name seek fairdom/seek:workflow`

Then use the web interface to create a first user (which will also be the admin). This notebook assumes:

* username: simleo
* password: 0123456789

The next step is to enable the workflows feature, which is disabled by default. In the web interface, click the top-right menu, then "Server admin", then "Enable/disable features", then scroll down to "SEEK features" and click "Workflows enabled".

Now use the web interface to create two workflows (in the top-left menu, click "Create" and then "Workflow"). Here we assume creation from:

1. https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/examples/1st-workflow.cwl
2. The `Galaxy-Workflow-Peaks_to_Gene_names___counts.ga` workflow from https://usegalaxy.eu/workflows/list_published ("Peaks to Gene names & counts" -> "Save as File")

In the above order. Associate the workflows to the Default Project (this should be the default option). Manually assign the "1st-workflow" title to the CWL workflow.

This notebook uses the JSON API to retrieve the workflow from the Workflow Hub instance.

JSON API examples are available at https://github.com/seek4science/seekAPIexamples

In [1]:
import requests
from urllib.request import urlretrieve
import pprint

In [2]:
pp = pprint.PrettyPrinter(indent=4)
base_url = 'http://localhost:3000'
headers = {
    "Content-type": "application/vnd.api+json",
    "Accept": "application/vnd.api+json",
    "Accept-Charset": "ISO-8859-1"
}
session = requests.Session()
session.headers.update(headers)
session.auth = "simleo", "0123456789"

In [3]:
r = session.get(base_url + "/workflows")
r.raise_for_status()
print("ALL WORKFLOWS:")
pp.pprint(r.json())

ALL WORKFLOWS:
{   'data': [   {   'attributes': {'title': 'Peaks to Gene names & counts'},
                    'id': '2',
                    'links': {'self': '/workflows/2'},
                    'type': 'workflows'},
                {   'attributes': {'title': '1st-workflow'},
                    'id': '1',
                    'links': {'self': '/workflows/1'},
                    'type': 'workflows'}],
    'jsonapi': {'version': '1.0'},
    'links': {'self': '/workflows'},
    'meta': {'api_version': '0.2', 'base_url': 'http://localhost:3000'}}


In [4]:
# note: I got a server error while trying to retrieve the CWL workflow.
wf_id = "2"
r = session.get(base_url + "/workflows/%s" % wf_id)
r.raise_for_status()
data = r.json()["data"]
print("WORKFLOW %s:" % wf_id)
pp.pprint(r.json())
print()

WORKFLOW 2:
{   'data': {   'attributes': {   'content_blobs': [   {   'content_type': 'application/octet-stream',
                                                           'link': 'http://localhost:3000/workflows/2/content_blobs/2',
                                                           'md5sum': 'ac2f0978b95ab3b8597b45d90a936299',
                                                           'original_filename': 'Galaxy-Workflow-Peaks_to_Gene_names___counts.ga',
                                                           'sha1sum': 'f8922c3279435ce9d9f32e1b5c018d06bf23182e',
                                                           'size': 13275,
                                                           'url': None}],
                                  'created_at': '2020-03-13T10:09:08.745Z',
                                  'description': None,
                                  'internals': None,
                                  'latest_version': 1,
                            

In [5]:
workflow_class = data["attributes"]["workflow_class"]["key"]
print("WORKFLOW CLASS: %s" % workflow_class)
blob = data["attributes"]["content_blobs"][0]
original_filename = blob["original_filename"]
r = session.get(blob["link"])  # this is a full URL
r.raise_for_status()
data = r.json()["data"]
print("CONTENT BLOB FOR %s:" % original_filename)
pp.pprint(r.json())
print()

WORKFLOW CLASS: Galaxy
CONTENT BLOB FOR Galaxy-Workflow-Peaks_to_Gene_names___counts.ga:
{   'data': {   'attributes': {   'content_type': 'application/octet-stream',
                                  'md5sum': 'ac2f0978b95ab3b8597b45d90a936299',
                                  'original_filename': 'Galaxy-Workflow-Peaks_to_Gene_names___counts.ga',
                                  'sha1sum': 'f8922c3279435ce9d9f32e1b5c018d06bf23182e',
                                  'size': 13275,
                                  'url': None},
                'id': '2',
                'links': {   'download': '/workflows/2/content_blobs/2/download',
                             'self': '/workflows/2/content_blobs/2'},
                'meta': {   'api_version': '0.2',
                            'base_url': 'http://localhost:3000',
                            'created': '2020-03-13T10:08:45.102Z',
                            'modified': '2020-03-13T10:09:08.738Z',
                            'uui

In [6]:
print("downloading workflow content to %s" % original_filename)
file_url = base_url + data["links"]["download"]
urlretrieve(file_url, original_filename)

downloading workflow content to Galaxy-Workflow-Peaks_to_Gene_names___counts.ga


('Galaxy-Workflow-Peaks_to_Gene_names___counts.ga',
 <http.client.HTTPMessage at 0x7fa3f37eb518>)