Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance Support for cwltool - Single Job Executor #676

Merged
merged 338 commits into from Jul 3, 2018

Conversation

Projects
None yet
8 participants
@FarahZKhan
Copy link
Member

FarahZKhan commented Feb 26, 2018

  • make diff_pylint_report is clean
  • Test with toil-cwl-runner, does this break anything?
  • move extra Python dependencies to an extra, add code to remind the user to install them if --provenance is called
  • @mr-c to review type hints
  • remove use of hasattr
  • add raw logs
setup.py Outdated
@@ -56,6 +56,13 @@
'schema-salad >= 2.6.20170927145003, < 3',
'typing >= 3.5.3',
'six >= 1.8.0',
'prov == 1.5.1'

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

needs a trailing comma

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 26, 2018

Author Member

Thanks, Added in new commit.

@@ -0,0 +1,46 @@

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

I think the directory tests/wf/test-prov should be removed from this PR. You can keep it on another branch for your personal use, if you wish

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 26, 2018

Author Member

Thanks, done just now.

pos = nx.nx_agraph.graphviz_layout(provgraph)
nx.draw(provgraph, pos=pos)
write_dot(provgraph, provDot)
check_call(['dot','-Tpng',provDot,'-o',original_path+'/ProvenanceVisualGraph.png'])

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Mind if I make the graphviz dependency optional? I can make a PR to this PR and you can review it

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 26, 2018

Author Member

Yes, that is fine I think. We can add Prov Viewer (https://github.com/gems-uff/prov-viewer) later.

This comment has been minimized.

@stain

stain Apr 9, 2018

Member

We removed this code as it was tricky to get transitive dependencies of nx installed.

@@ -1,3 +1,6 @@
<<<<<<< HEAD
This is whale file for testing revsort.cwl workflow
=======

This comment has been minimized.

@mr-c

This comment has been minimized.

@stain

stain Apr 9, 2018

Member

whale.txt was restored.

@@ -26,7 +26,7 @@ inputs:
doc: "The input file to be processed."
default:
class: File
location: hello.txt
location: hello.txt

This comment has been minimized.

@mr-c

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 26, 2018

Author Member

I think it was for the testing purpose, Wanted to see if the workflow paths still work if default is not given.

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Okay, you should undo the change.

This comment has been minimized.

@stain

stain Apr 9, 2018

Member

Fixed in 61797e0

output.txt Outdated
@@ -0,0 +1 @@
elif tset elahw si siht

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Probably should be removed from this PR

@@ -323,17 +353,18 @@ def get_from_requirements(self, r, req, pull_image, dry_run=False):
# type: (Dict[Text, Text], bool, bool, bool) -> Text
pass


# type: (bool, bool, bool, Text, **Any) -> None

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

I think the line above needs to be removed, or put where it belongs.

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 26, 2018

Author Member

Done

@mr-c mr-c self-assigned this Feb 26, 2018

import schema_salad.validate as validate
from ruamel.yaml.comments import CommentedMap, CommentedSeq
from schema_salad.sourceline import SourceLine, cmap

from .provenance import *
from . import draft2tool, expression

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

replace draft2tool with command_line_tool

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 26, 2018

Author Member

Done

@@ -170,11 +176,10 @@ def _setup(self, kwargs): # type: (Dict) -> None
_logger.debug(u"[job %s] initial work dir %s", self.name,
json.dumps({p: self.generatemapper.mapper(p) for p in self.generatemapper.files()}, indent=4))

def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):
def _execute(self, runtime, env, kwargs, document=None, WorkflowRunID=None, ProcessProvActivity=None,reference_locations=None, rm_tmpdir=True, move_outputs="move"):
# type: (List[Text], MutableMapping[Text, Text], bool, Text) -> None

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

You need to update the type signature to match the changes you made

This comment has been minimized.

@FarahZKhan

FarahZKhan Apr 19, 2018

Author Member

done

def run(self, pull_image=True, rm_container=True,
rm_tmpdir=True, move_outputs="move", **kwargs):
def run(self, document=None, WorkflowRunID=None, ProcessProvActivity=None,reference_locations=None, pull_image=True, rm_container=True,
rm_tmpdir=True, move_outputs="move", **kwargs):
# type: (bool, bool, bool, Text, **Any) -> None

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

likewise for this type signature as well

This comment has been minimized.

@FarahZKhan

FarahZKhan Apr 19, 2018

Author Member

done

setup.py Outdated
'prov == 1.5.1',
'pathlib == 1.0.1',
'graphviz == 0.8.2',
'uuid == 1.30',

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

hmm.. this library might not be Python 3 compatible ..

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Yeah, not updated since 2007 :-P https://pypi.python.org/pypi/uuid

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

It is a backport of the functionality within the Python 2.5+ standard library, so remove uuid from setup.py and requirements.txt and I bet many more of the tests will pass

This comment has been minimized.

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 26, 2018

Author Member

Can we still use it even if removed from setup.py and requirement.txt?

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Yes, uuid is part of the Python Standard Library from Python 2.5 and onwards: https://docs.python.org/2/library/uuid.html

setup.py Outdated
@@ -56,6 +56,12 @@
'schema-salad >= 2.6.20170927145003, < 3',
'typing >= 3.5.3',
'six >= 1.8.0',
'prov == 1.5.1',

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Does prov 1.5.0 work as well?

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 27, 2018

Author Member

It does but there is a bug with datetime() and that might sometimes not record the start and end time of the activity and many more time instances used in quite a few properties.
In addition, if you want to make changes to prov serialization later to have a bundle instead of just one prov file (for example in case of sub-workflows), then prov 1.5.1 is better.

setup.py Outdated
@@ -56,6 +56,12 @@
'schema-salad >= 2.6.20170927145003, < 3',
'typing >= 3.5.3',
'six >= 1.8.0',
'prov == 1.5.1',
'pathlib == 1.0.1',

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

pathlib is deprecated, probably should use pathlib2

https://pypi.python.org/pypi/pathlib/
https://pypi.python.org/pypi/pathlib2/

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 27, 2018

Author Member

resolved

setup.py Outdated
'prov == 1.5.1',
'pathlib == 1.0.1',
'graphviz == 0.8.2',
'pygraphviz == 1.3.1',

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Are both needed? Only graphviz seems to be used

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 27, 2018

Author Member

removed pygraphviz

setup.py Outdated
'pathlib == 1.0.1',
'graphviz == 0.8.2',
'pygraphviz == 1.3.1',
'Matplotlib == 2.1.2',

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

This appears to be unused, can it be removed?

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 27, 2018

Author Member

Removed

setup.py Outdated
'graphviz == 0.8.2',
'pygraphviz == 1.3.1',
'Matplotlib == 2.1.2',
'pydot == 1.2.4'

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Likewise pydot doesn't appear to be used either.

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 27, 2018

Author Member

Removed

setup.py Outdated
@@ -56,6 +56,12 @@
'schema-salad >= 2.6.20170927145003, < 3',
'typing >= 3.5.3',
'six >= 1.8.0',
'prov == 1.5.1',
'pathlib == 1.0.1',
'graphviz == 0.8.2',

This comment has been minimized.

@mr-c

mr-c Feb 26, 2018

Member

Is graphviz 0.8.1 sufficient? TravisCI in Python 3.3 mode can't see to find 0.8.2 https://travis-ci.org/common-workflow-language/cwltool/jobs/346298771#L521

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 27, 2018

Author Member

Like other modules, I tried updating this one too to use 0.8.1 instead but the workflow would not run if the version is changed. I am unsure about the underlying reason but can explore further.

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 27, 2018

Author Member

Also tried with 0.8 but same results.

# This makes later close() a no-op
self.folder = None

def create_ro(tmpPrefix):

This comment has been minimized.

@mr-c

mr-c Feb 27, 2018

Member

Needs a type specification

This comment has been minimized.

@stain

stain Apr 9, 2018

Member

Now

def create_researchObject(tmpPrefix  # type: str
             ):
@@ -298,7 +332,28 @@ def print_pack(document_loader, processobj, uri, metadata):
else:
return json.dumps(packed["$graph"][0], indent=4)


def generate_provDoc():

This comment has been minimized.

@mr-c

mr-c Feb 27, 2018

Member

Needs a type specification

This comment has been minimized.

@FarahZKhan

FarahZKhan Feb 28, 2018

Author Member

Done

FarahZKhan added some commits Feb 27, 2018

if builder is not None:
r.builder = builder
if r.outdir:
self.output_dirs.add(r.outdir)
r.run(**kwargs)
if ro:

This comment has been minimized.

@anton-khodak

anton-khodak Feb 28, 2018

Member

I would suggest creating a separate subclass of JobExecutor here to keep the API simpler and cleaner since the number of changes and if clauses for a single feature is large enough.

document = ProvDocument()
engineUUID=""
activity_workflowRun={} # type: Dict[Text, Any]
defaultStreamHandler = logging.StreamHandler()

This comment has been minimized.

@anton-khodak

anton-khodak Feb 28, 2018

Member

Consider reusing defaultStreamHandler from loghandler module

This comment has been minimized.

@FarahZKhan

FarahZKhan Mar 16, 2018

Author Member

Apparently I did not need it :). Thanks for indicating. removed in new commits.

import schema_salad.validate as validate
from ruamel.yaml.comments import CommentedMap, CommentedSeq
from schema_salad.sourceline import SourceLine, cmap

from .provenance import *

This comment has been minimized.

@anton-khodak

anton-khodak Feb 28, 2018

Member

Consider removing this import

This comment has been minimized.

@FarahZKhan

FarahZKhan Mar 16, 2018

Author Member

Done

# See ./cwltool/schemas/v1.0/Process.yml
hashmethod = hashlib.sha1

class RO():

This comment has been minimized.

@anton-khodak

anton-khodak Feb 28, 2018

Member

I would suggest using longer names ResearchObject or research_obj instead of RO() and ro everywhere outside the module so they are more meaningful to unfamiliar code users.

This comment has been minimized.

@FarahZKhan

FarahZKhan Mar 16, 2018

Author Member

Thanks @anton-khodak for the feedback. Definitely makes sense. I'd make the required changes.

FarahZKhan added some commits Mar 16, 2018

@@ -268,8 +308,12 @@ def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move", secret_sto

if processStatus != "success":
_logger.warning(u"[job %s] completed %s", self.name, processStatus)
if research_obj:

This comment has been minimized.

@psafont

psafont Mar 16, 2018

Contributor

Consider taking the inner blocks if research_obj: out of the if status != success: block to avoid code duplication.

This comment has been minimized.

@FarahZKhan

FarahZKhan Mar 17, 2018

Author Member

Thanks @psafont, good point. the new commit has resolved this.

FarahZKhan and others added some commits Jun 29, 2018

@mr-c

This comment has been minimized.

Copy link
Member

mr-c commented Jul 1, 2018

I ran this PR against the CWL conformance tests using

for n in $(seq 1 132); do ./run_test.sh RUNNER=cwltool -n${n} EXTRA="--provenance test_${n}"; done

Many tests passed; there are three types of problems so far

Test failed: /home/michael/cwltool/env3/bin/cwltool --provenance test_14 --outdir=/tmp/tmpkkshqorq --quiet v1.0/null-expression1-tool.cwl v1.0/empty.json
Test default usage of Any in expressions.
Returned non-zero
Got workflow error
Traceback (most recent call last):
  File "/home/michael/cwltool/cwltool/executors.py", line 149, in run_jobs
    runtimeContext.make_fs_access, runtimeContext)
  File "/home/michael/cwltool/cwltool/provenance.py", line 409, in _evaluate
    process_name = urllib.parse.quote(str(job.name), safe=":/,#")
AttributeError: 'ExpressionJob' object has no attribute 'name'
Workflow error, try again with --debug for more information:
'ExpressionJob' object has no attribute 'name'
--- Running conformance test v1.0 on /home/michael/cwltool/env3/bin/cwltool ---
/home/michael/cwltool/env3/bin/cwltool 1.0.20180701175226
Test [20/132] 
Test failed: /home/michael/cwltool/env3/bin/cwltool --provenance test_20 --outdir=/tmp/tmpdplet4yj --quiet v1.0/any-type-compat.cwl v1.0/any-type-job.json
Testing Any type compatibility in outputSource
Returned non-zero
I'm sorry, I couldn't load this CWL file, try again with --debug for more information.
The error was: An identifier is missing. All PROV elements require a valid identifier.
--- Running conformance test v1.0 on /home/michael/cwltool/env3/bin/cwltool ---
/home/michael/cwltool/env3/bin/cwltool 1.0.20180701175226
Test [85/132] 
Test failed: /home/michael/cwltool/env3/bin/cwltool --provenance test_85 --outdir=/tmp/tmp0ht0pjbd --quiet v1.0/dir2.cwl v1.0/dir-job.yml
Test directory input in Docker
Returned non-zero
Got workflow error
Traceback (most recent call last):
  File "/home/michael/cwltool/cwltool/executors.py", line 154, in run_jobs
    job.run(runtimeContext)
  File "/home/michael/cwltool/cwltool/job.py", line 483, in run
    self._execute(runtime, env, runtimeContext)
  File "/home/michael/cwltool/cwltool/job.py", line 231, in _execute
    runtimeContext.reference_locations, str(self.name))
  File "/home/michael/cwltool/cwltool/provenance.py", line 493, in used_artefacts
    with fsaccess.open(location, "rb") as fhandle:
  File "/home/michael/cwltool/cwltool/stdfsaccess.py", line 49, in open
    return open(self._abs(fn), mode)
IsADirectoryError: [Errno 21] Is a directory: '/home/michael/common-workflow-language/v1.0/v1.0/testdir'
Workflow error, try again with --debug for more information:
[Errno 21] Is a directory: '/home/michael/common-workflow-language/v1.0/v1.0/testdir'

mr-c and others added some commits Jun 30, 2018

@mr-c mr-c merged commit 2ba677a into master Jul 3, 2018

6 checks passed

Tidelift Dependencies checked
Details
codecov/patch 78.63% of diff hit (target 68.69%)
Details
codecov/project 70.13% (+1.43%) compared to 639229b
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
default Build finished.
Details

@mr-c mr-c deleted the provenance branch Jul 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.