Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/export cwl #19

Merged
merged 28 commits into from
Aug 10, 2020
Merged

Feature/export cwl #19

merged 28 commits into from
Aug 10, 2020

Conversation

james-strauss-uwa
Copy link
Collaborator

Sorry, I've put this off for too long thinking I would make improvements, but decided to just create the pull request and get your feedback.

Changes:

  • Added a 'Export to CWL' button to the pg_viewer HTML page that is served by lg web on translation.
  • Added dlg.dropmake.cwl with cwl-related functionality
  • Create zip file containing multiple "cwl tool description" files and a single "cwl workflow" file.
  • Unit tests from Dave

Possible issues:

  • CWL only supports BashShellApp nodes. If a user exports a graph with nodes of other types, the feedback isn't great.
  • There are some CI tests that don't pass, but these remind me of the previous "timeout" issues you put down to messaging changes. Let me know if this is not correct.

james-strauss-uwa and others added 25 commits March 20, 2020 11:17
…n makes a GET request to server for CWL file. Note: at the moment, the reply is the original JSON of the graph, not its CWL export.
…WL file for the requested graph. Note: At the moment, the CWL response is a trivial placeholder workflow with no steps.
…t. The pg_spec contains more information and I hope it is sufficient to build a CWL.
…. The N command line tool descriptions plus the workflow description file are all added to a zip file for download.
… and command line description files. Generated CWL is almost executable.
… than use the path inherited from the server filesystem.
This removes these test-only dependencies from the daliuge-translator
package, which otherwise doesn't needs these.

There was also a conflict with the ruamel.yaml dependency as needed by
cwlgen (used by daliuge-translator) and cwltool (used by the tests),
where the version constraints on ruamel.yaml stated by each package are
not fully exclusive, but are different enough that pip will install a
version of ruamel.yaml that satisfies only one of the packages. By
specifying a specific version of this package we ensure both our
packages get what they need.

There was a second similar issue with Travis' pip not automatically
upgrading the "typing" package: version 3.6.6 comes pre-installed in the
virtual environment created by Travis, and the first package to require
it (cwltool) requires >= 3.5.3, and therefore the version installed is
deemed as safe. A later package (typing_extensions, required by cwltool
too) requires >= 3.7.4, however pip doesn't automatically upgrade the
installed version. This issues an "ERROR" message by pip, which still
exits with a 0 return code, and therefore the unit tests still run (and
fail).

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
This is different to common.b2s (always turns bytes into a str instance)
and six.b (always turns a str instance into bytes): this one turns a
text object (str in py3, unicode in 2.7) into a str object. One does not
usually need this, but we have now a use case.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
cwlgen doesn't know how to serialize unicode objects in python 2.7,
making our tests fail. This commit fixes that by turning unicode objects
into str instances before giving them to cwlgen.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
@coveralls
Copy link

coveralls commented Aug 5, 2020

Coverage Status

Coverage increased (+0.2%) to 75.511% when pulling 5d29079 on feature/export-cwl into ce0be8a on master.

Copy link
Contributor

@rtobar rtobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks James, In general I think it look really good :).

Yes, the failures are all due to the timeout issue. That's on me to fix properly on the master branch, sorry for the noise.

I left a couple of observations throughout the code, mostly around simplifying a bit the translation process, plus other minor things. Let me know your thoughts on these and whether you think it's feasible to tackle them. Thanks again!

daliuge-common/dlg/clients.py Outdated Show resolved Hide resolved
daliuge-translator/dlg/dropmake/cwl.py Show resolved Hide resolved
daliuge-translator/dlg/dropmake/web/lg_web.py Outdated Show resolved Hide resolved
Comment on lines 298 to 300
# get the pgt
pgt = unroll(opts.lg_path, opts.oid_prefix, zerorun=opts.zerorun, app=apps[opts.app])
partition(pgt, opts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the CWL translation takes a physical graph as an input, I'd incline myself to reflect that better in the "dlg cwl" command. What I have in mind is to remove the unroll and partition steps, and let it accept a physical graph as input instead. This in turn makes it composable, so you can run "dlg unroll -L lg.json | dlg cwl" (one could even add a unit test for it, like the one already in test_tool.py).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: ideally it would be nice to follow the same idea of "the output goes directly to stdout by default" that the other commands have, but that would mean more changes to the code below so let's not do that yet. Could we however have a command-line option to specify the output file name? That should be simpler, and takes us a bit further ahead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change too. I'll change dlg cwl to accept a physical graph as input.

I guess we'll json.load() the physical graph file from the opts.pgt_path and then pass it to create_workflow(). But do we have to do anything to the JSON to make it a valid PGT to be passed to create_workflow(), perhaps parse the JSON into a PGT data structure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmmm I don't think so, but I might be wrong. Maybe give it a try and see how that goes. If it fails we can have a look at what exactly the requirement is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does work. The PGT loaded from JSON is a list of drops/nodes and can be passed directly to create_workflow().

But, the PGT that is used in 'lg web' is slightly different. It is retrieved from the pg_mgr using pg_mgr.get_pgt() and the list of drops/nodes is within a 'drops' attribute of the PGT. So I pass pgt.drops to create_workflow() in this case. Maybe the difference is that in this case the structure is actually a PGTP not a PGT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the ability to specify the output filename.

Copy link
Collaborator Author

@james-strauss-uwa james-strauss-uwa Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can now specify the location of the CWL output from "dlg cwl", for example:

dlg unroll -L test.graph | dlg cwl -o outputs/test.zip

If the output location is not specified, then the output will be sent to a file called "-", since that is default value of the "output" argument. We can add correct handling of stdout to the list of future work.


output_list = []

cwl_output = "/tmp/cwloutput/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not horribly important, but instead of using "/tmp" and hardcoded subdirectories there, it would be nicer to use tempfile and its methods to get temporary file/directory names, which then you'd remove at the end of the test (or otherwise the OS will automatically remove later on too).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make this change too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code to use tempfile.mkdtemp() to create temporary directories for the CWL output and the clone of EAGLE_test_repo for input. Also, I shifted the existing use of shutil.rmtree() to delete the temporary directories to after the tests.

@james-strauss-uwa
Copy link
Collaborator Author

Thanks for your feedback. I'm happy with the changes we made here. The two issues remaining are:

  1. Modify "dlg cwl" to handle output to stdout
  2. Better feedback to users when they attempt to translate a graph with non-BashShellApp nodes.

I think we'll add these to technical debt. I'll add them to the YANDA Tech Debt page on confluence.

@rtobar
Copy link
Contributor

rtobar commented Aug 10, 2020

@james-strauss-uwa the new changes look great! I'm happy for you to merge this into the master branch.

@james-strauss-uwa james-strauss-uwa merged commit dae6406 into master Aug 10, 2020
@rtobar rtobar deleted the feature/export-cwl branch July 7, 2021 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants