Common Workflow Language (CWL) support? #45

olgabot · 2016-11-14T18:57:52Z

Hello! This is a very interesting project. In case you haven't seen it, there's a project called Common Workflow Language (CWL) that attempts to create a single document specifying a pipeline workflow that can be parsed by a multitude of programs so your pipeline can be run portably on the cloud, a laptop, a server, etc.

Wanted to let you know about other people in the reproducibility space :)

thejmazz · 2016-11-15T16:30:19Z

hi! thanks for the feedback!

we are well aware of CWL, while this project was being developed (I had 12 weeks, 4 of which was deciding what to do..), we were looking into CWL, but the spec had not been finalized, and time was tight - it would have been too much for the MVP to use CWL.

now that the 1.0 is out, definitely going to spend time investigating how to integrate.

Some ideas:

serialize dynamic pipeline into CWL files
CWL definition -> watermill task

I think CWL and watermill can work together, rather than be alternatives

They each have different objectives.

Watermill lets you orchestrate an entire pipeline composed of tasks from a high level, while CWL seems to be about strictly defining tasks that belong to some pipeline, strictly in the sense that filenames are "hardcoded".

Perhaps the logging output of a watermill pipeline could be a bunch of CWL files with absolute filenames baked in for example. One institution could run a pipeline on their cluster, then have its execution dumped into CWL files. Then others could run these "baked" (taking the term from computer graphics - "baking" a texture with shadows for example) pipeline assuming files are in place.

Integration will be really important because maybe CWL can handle cluster usage, AWS usage, etc on its own and so we don't have to implement that.

Correct me on CWL assumptions if I'm wrong (@ everyone reading this) - I want to have wrong assumptions on CWL which come from a lack of reading its docs / understanding it, and say things that annoy people who know it well, and then be corrected, and we can all have a happy discussion!

Something to get CWL lovers steamed:

is CWL biased towards Galaxy? if so, that is a failure of the spec imo.

thejmazz · 2016-11-15T16:34:37Z

cc @tetron @mr-c @pditommaso

bmpvieira · 2016-11-15T16:54:23Z

Thanks @olgabot :)

I've been aware of CWL since BOSC2015 and discussed with @tetron at Biohackathon 2015 how it could be used with bionode.

One thing I'm looking forward to is using CWL wrapped bioinformatic tools in a watermill pipeline because wrapping is a pain and I just want to have to deal with JSON objects in and out of a an existing tool (i.e., samtools). We should give it a try once more wrapped tools become available.

Cheers

mr-c · 2016-11-15T17:00:03Z

Hello again @olgabot and @bmpvieira; nice to meet you @thejmazz.

CWL is a standard, not a platform, and it has two specifications: one for describing command line tools, another for describing workflows made from those command line tools.

We don't see ourselves as being in competition with anyone -- our goal is to enable more tools and platforms to communicate and interoperate.

For maximum composability, users are encouraged to keep these in separate files and refer to the individual tool descriptions using an identifier, often a relative path but it could be something more portable. However a CWL documents and all of its referenced CWL parts can be 'packed' into a single file.

Nothing in CWL requires the use of filenames. It is a personal priority to encourage a move away from make style overloading of filenames with multidimensional metadata through better tooling and systems.

The CWL standards obviously don't run on the cloud, but many of the current implementations do so; we designed it to be mindful of platforms targeting unified filesystems or "shared nothing" filesystems.

The CWL project has four co-founders: two from academia, two from commercial open source producing companies. Yes, one of those is from the Galaxy project; but our goal for the project was for CWL to be a catalyst for all platforms to co-evolve with and towards. We certainly learned from the Galaxy project's decade plus experience along with influences from many other perspectives.

FYI: before the v1.0 spec was released we supported marking input and outputs as being streaming capable, where supported by the underlying tools. So big ❤️ to this project's focus on likewise avoiding unnecessary writing to disk :-)

I'd be happy to schedule a video chat open to the public so that interested people from bionode and CWL can chat in real time. Just let me know!

thejmazz · 2016-11-15T21:24:08Z

Nice to meet you as well @mr-c :) Great project summary! Hope my uninformed comments were not taken in a bad way!

Nothing in CWL requires the use of filenames

❤️ ❤️ ❤️

Would love to have a chat as well too. Should read up a bit more on the CWL spec myself first though.

thejmazz added CWL discussion labels Nov 15, 2016

bmpvieira added the mozsprint label Apr 24, 2017

bmpvieira mentioned this issue May 24, 2017

Mozilla's Global Sprint (June 1st and 2nd 2017) bionode/bionode#44

Closed

17 tasks

thejmazz mentioned this issue Aug 25, 2017

Tasks as npm modules #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common Workflow Language (CWL) support? #45

Common Workflow Language (CWL) support? #45

olgabot commented Nov 14, 2016

thejmazz commented Nov 15, 2016

thejmazz commented Nov 15, 2016

bmpvieira commented Nov 15, 2016

mr-c commented Nov 15, 2016

thejmazz commented Nov 15, 2016

Common Workflow Language (CWL) support? #45

Common Workflow Language (CWL) support? #45

Comments

olgabot commented Nov 14, 2016

thejmazz commented Nov 15, 2016

thejmazz commented Nov 15, 2016

bmpvieira commented Nov 15, 2016

mr-c commented Nov 15, 2016

thejmazz commented Nov 15, 2016