Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provenance Support for cwltool - Single Job Executor #676

Merged
merged 338 commits into from
Jul 3, 2018
Merged
Show file tree
Hide file tree
Changes from 73 commits
Commits
Show all changes
338 commits
Select commit Hold shift + click to select a range
0661a89
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
FarahZKhan Feb 27, 2018
eb5cd6f
fixes types
FarahZKhan Feb 27, 2018
04caac0
type fixing
FarahZKhan Feb 27, 2018
9f1019b
Merge branch 'master' into provenance
FarahZKhan Mar 16, 2018
2e44cab
work on reviews and make packed.cwl re-runnable
FarahZKhan Mar 16, 2018
3999772
remove from .provenance import *
FarahZKhan Mar 16, 2018
b48e95f
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
FarahZKhan Mar 16, 2018
7b892ea
fix missing comma in function definition
FarahZKhan Mar 16, 2018
a29ec09
change names ro to either ResearchObject or research_obj for better d…
FarahZKhan Mar 16, 2018
757b117
fix type errors for python3.5
FarahZKhan Mar 17, 2018
0497e6e
move if research_obj to avoid code duplication
FarahZKhan Mar 17, 2018
ecc62e2
add split() for packed workflow
FarahZKhan Mar 17, 2018
f564e5a
clean up executors.py, move code to provenance.py and handle list of …
FarahZKhan Mar 21, 2018
cc4feb5
rename the provenance serialization function
FarahZKhan Mar 21, 2018
9daf544
add prov-n serialization
FarahZKhan Mar 21, 2018
d492747
add docstrings for functions
FarahZKhan Mar 21, 2018
a27e0a6
Merge branch 'master' into provenance
FarahZKhan Mar 24, 2018
5dd64ad
Merge branch 'master' into provenance
FarahZKhan Mar 26, 2018
778b167
rename job file and directories using lowercase
FarahZKhan Mar 26, 2018
11b81d1
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
FarahZKhan Mar 26, 2018
9d4ab3c
Merge branch 'master' into provenance
FarahZKhan Mar 27, 2018
9a0ec6a
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
FarahZKhan Mar 28, 2018
76c5b2b
Disable graphviz output
stain Mar 28, 2018
0e82791
consistent urn:uuid: namespace
stain Mar 28, 2018
8bc016d
Ensure namespaces are (somewhat) correct
stain Mar 28, 2018
fa945d2
Improve PROV output
stain Mar 28, 2018
e6618b2
Correct namespace in PROV for wfdesc/wfprov
stain Mar 28, 2018
8f648a2
Store fixed values in RO as well
stain Mar 28, 2018
f18bab8
Correct hasSubProcess links
stain Mar 28, 2018
eb6f8c6
TODO: Declare roles/parameters as well
stain Mar 28, 2018
da9bdb9
bagit logs run identifier
stain Mar 29, 2018
9d74a2b
ensure finally block works even on early errors
stain Mar 29, 2018
10ec554
Output data added to data/
stain Mar 29, 2018
e950673
include prov files in tagmanifest
stain Mar 29, 2018
df8f4c4
Calculate checksum on all used data
stain Mar 29, 2018
b30dc6d
usage roles also in namespace (not string)
stain Mar 29, 2018
12f3ea6
Enforce --compute-checksum
stain Mar 29, 2018
509cb3a
fail early if incompatible --no-compute-checksum --provenance
stain Mar 29, 2018
86dcef6
fix to correct bug assigning none to job obj
FarahZKhan Apr 2, 2018
dbc94d2
Merge branch 'master' into provenance
FarahZKhan Apr 2, 2018
371bb76
fix type placement
FarahZKhan Apr 2, 2018
d62bd96
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
FarahZKhan Apr 2, 2018
6cbc086
Merge branch 'master' into provenance
mr-c Apr 2, 2018
2ce5934
remove no-global-site-package and add types
FarahZKhan Apr 2, 2018
c4645c2
fix Namespace import
FarahZKhan Apr 2, 2018
65ac0b9
fix type errors for travis tests
FarahZKhan Apr 2, 2018
69e1637
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
FarahZKhan Apr 2, 2018
fdadf51
fix encoding and URI spaces in names of processRun
FarahZKhan Apr 3, 2018
6cc95c1
make bagit file writing compatible for python 2 and 3
FarahZKhan Apr 3, 2018
55d5003
Python 2/3 compatible file reading
stain Apr 3, 2018
7985eb9
urllib for py2/py3
stain Apr 3, 2018
fddcc4e
prov imports tidy
stain Apr 3, 2018
b617c96
py3: avoid dict.iteritems
stain Apr 3, 2018
5898751
py3 compatible file copy
stain Apr 3, 2018
64e42c8
py3 simplify io.open usage
stain Apr 3, 2018
c19c513
Added WritableBagFile, use for prov outputs
stain Apr 3, 2018
0dd09df
More py2/py3 fun
stain Apr 3, 2018
f752266
Tidy WritableBagFile
stain Apr 3, 2018
973cff2
Record Research Object ORE manifest
stain Apr 4, 2018
dee57c2
FIXME comment
stain Apr 4, 2018
04b7baa
createdBy and conformsTo
stain Apr 4, 2018
1c46eb4
make OA annotations
stain Apr 4, 2018
a0a9c91
Merge branch 'master' into provenance
FarahZKhan Apr 4, 2018
12bf53e
fix unicode/str conversion
FarahZKhan Apr 4, 2018
8b99c03
py2/py3: Use six.moves.urllib
stain Apr 4, 2018
ee2bb97
avoid unicode() tests
stain Apr 4, 2018
affe143
log provenance for cmdtool execution
stain Apr 4, 2018
40f8f70
minimal docker execution data logged in prov
stain Apr 4, 2018
31f6176
Avoid try.except import
stain Apr 4, 2018
bd2d9bf
Also log cwlprov:image of docker container
stain Apr 4, 2018
5248d81
typeshed for past.builtins
stain Apr 4, 2018
abbe3b8
Ignore past.builtins imports in mypy
stain Apr 4, 2018
e27f061
minimal test of --provenance
stain Apr 4, 2018
c164ee1
Don't run revsort.cwl with container
stain Apr 4, 2018
fb2aad5
pytest import
stain Apr 4, 2018
bb8f5a6
pytest need onWindows
stain Apr 4, 2018
c09066d
test research object bagit structure
stain Apr 4, 2018
bef92b4
test _convert_path
stain Apr 4, 2018
a400574
Avoid ProvenanceException, ValueError is usually good
stain Apr 4, 2018
ab09a22
Merge remote-tracking branch 'origin/master' into provenance
stain Apr 4, 2018
45348dc
more provenance tests
stain Apr 4, 2018
e5a714f
fix unicode/str issue for test
FarahZKhan Apr 4, 2018
dd6c81b
test sha256/sha512
stain Apr 4, 2018
7cc3807
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
stain Apr 4, 2018
06b456d
WritableBagFile - test data/ adding
stain Apr 4, 2018
d13a9a0
make bagit a general requirement
mr-c Apr 5, 2018
897ef6a
Merge branch 'master' into provenance
stain Apr 8, 2018
ee8e420
Ensure \n newlines in Windows as well
stain Apr 9, 2018
a7ca40e
For consistency make bag-info.txt be alphabetical
stain Apr 9, 2018
61797e0
Revert whitespace changes from 480bbc2c030d185531fb8899412aa1ddf8d00c07
stain Apr 9, 2018
921fc1d
not really needing bagit import yet
stain Apr 9, 2018
97768d9
attempt to capture unix user info
stain Apr 13, 2018
ce09136
Also record user in manifest.json
stain Apr 13, 2018
205a95a
fix typo in comment
FarahZKhan Apr 16, 2018
c800f1f
Merge branch 'master' into provenance
mr-c Apr 16, 2018
495406d
remove 2nd pathlib
mr-c Apr 16, 2018
13745fb
capture ORCID
stain Apr 13, 2018
3672c21
add user/host to prov document, only if enabled
stain Apr 13, 2018
ff7db8b
orcid in manifest.json, smaller whoami
stain Apr 13, 2018
d3d9ff0
_whoami() now only simple tuple
stain Apr 13, 2018
22e2f2a
prov:label for account
stain Apr 13, 2018
81f7e2b
Formatting: Avoid trailing whitespace
stain Apr 16, 2018
7f1e64d
Shell variable ORCID as default for --orcid
stain Apr 16, 2018
dbe7791
[provenance] run: -> id:
stain Apr 16, 2018
e710230
TODO comments on provenance
stain Apr 16, 2018
0e6300b
typeshed for ResearchObject constructor
stain Apr 16, 2018
cf455b5
a bit more typing
stain Apr 16, 2018
f80da96
[provenance] More typing..
stain Apr 16, 2018
d549db9
[provenance] a bit more typeshed typing
stain Apr 16, 2018
e7df842
[provenance] Trying to do typing of job() .. not easy!
stain Apr 16, 2018
5ead782
[provenance] more typing.
stain Apr 16, 2018
13e0bf7
Corrected orcid pattern
stain Apr 16, 2018
54b72ed
[provenance] Correct namespace for --enable-host-provenance
stain Apr 16, 2018
8f3ba1c
README: about --provenance and Research Objects
stain Apr 16, 2018
6b1c876
Test ORCID checksum
stain Apr 17, 2018
f82c794
Support for CWL_FULL_NAME environment variable
stain Apr 17, 2018
fcd9acc
[provenance] --help hint about shell variables
stain Apr 17, 2018
623fbfb
Only need "future" in Python 2
stain Apr 17, 2018
9d48df5
remove unused loader
FarahZKhan Apr 18, 2018
1cc863d
fix where to call startProcess()
FarahZKhan Apr 18, 2018
1998505
combine two if conditions using research_obj
FarahZKhan Apr 18, 2018
f002ec4
remove commented import
FarahZKhan Apr 18, 2018
c431e9d
remove commented line from main.py
FarahZKhan Apr 18, 2018
81518f7
remove lengthy comments
FarahZKhan Apr 18, 2018
fbf2aac
fix type in job.py
FarahZKhan Apr 18, 2018
ebb372a
line wrap for function call
FarahZKhan Apr 18, 2018
4f53bef
type fixing
FarahZKhan Apr 18, 2018
32744ab
more changes for independent commandline tool execution with --proven…
FarahZKhan Apr 18, 2018
ec8b886
Merge branch 'master' into provenance
FarahZKhan Apr 18, 2018
a79abbd
add more comments
FarahZKhan Apr 18, 2018
d581772
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
FarahZKhan Apr 18, 2018
c70f62a
Merge branch 'master' into provenance
FarahZKhan Apr 18, 2018
be370e6
add missing 'future' dependency
mr-c Apr 18, 2018
4d06603
move docstring in the function
FarahZKhan Apr 19, 2018
628c4ae
Merge branch 'master' into provenance
FarahZKhan Apr 19, 2018
e81f497
Merge branch 'provenance' of github.com:common-workflow-language/cwlt…
FarahZKhan Apr 19, 2018
3a51227
refactor printdeps() to avoid print in case of RO
FarahZKhan Apr 19, 2018
a699e46
pass research_obj instead of kwargs in _execute()
FarahZKhan Apr 19, 2018
3212eb9
move workflowRunID and Provdocument generation to provenance.py
FarahZKhan Apr 19, 2018
673326b
Merge branch 'master' into provenance
FarahZKhan Apr 19, 2018
7a326c2
fix merge
mr-c Apr 19, 2018
0cf6dde
remove unwanted if condition and add return in printdeps
FarahZKhan Apr 19, 2018
2187441
update types
FarahZKhan Apr 19, 2018
e4e094f
remove extra pathlib2 dep
mr-c Apr 19, 2018
a50c171
move prov initializers
mr-c Apr 19, 2018
4f9369f
Merge branch 'master' into provenance
mr-c Apr 19, 2018
2bc2f84
Merge branch 'master' into provenance
FarahZKhan Apr 20, 2018
4b1978e
test provenance class for nested workflows
FarahZKhan Apr 23, 2018
919c9eb
remove unwanted print statement
FarahZKhan Apr 23, 2018
ad0726a
move declare_artefacts() to provenance class
FarahZKhan Apr 23, 2018
2e56f5d
for user provenance refer to correct document
FarahZKhan Apr 23, 2018
d739519
move prospective provenance capture to provenance class
FarahZKhan Apr 23, 2018
ec5ae84
fix types and add None for positional provdoc argument
FarahZKhan Apr 23, 2018
6004b86
remove unused variables from main and fix types
FarahZKhan Apr 24, 2018
debac27
Merge branch 'master' into provenance
mr-c Apr 24, 2018
0fe1ed8
Merge branch 'master' into provenance
FarahZKhan Apr 30, 2018
7229e4f
move variables
FarahZKhan May 1, 2018
be33530
Merge branch 'master' into provenance
mr-c May 1, 2018
0b4d085
Merge branch 'master' into provenance
FarahZKhan May 2, 2018
d3dfd32
support for nested provenance profiles
FarahZKhan May 13, 2018
2e9a035
Merge branch 'master' into provenance
FarahZKhan May 13, 2018
b036aaf
remove commented part and clean code
FarahZKhan May 13, 2018
72307d8
remove unwanted provobj argument
FarahZKhan May 13, 2018
865c15f
remove unwanted arguments
FarahZKhan May 13, 2018
bd57c6a
remove prints
FarahZKhan May 13, 2018
e7ee228
make it runnable without provenance flag
May 14, 2018
fb0522a
move the evaluation of r to provenance
FarahZKhan May 15, 2018
d257ef4
remove extra if condition
FarahZKhan May 16, 2018
d8e7759
rename parent to parent_wf
FarahZKhan May 16, 2018
0f9a8f3
add step for nested workflow in primary provenance profile
FarahZKhan May 19, 2018
4c9e747
add fixme for engine
FarahZKhan May 22, 2018
53be19e
remive unwanted parameters from run()
FarahZKhan May 22, 2018
4ddfb99
use workflow run URIs to identify the output callback
FarahZKhan May 22, 2018
c7421b1
Merge branch 'master' into provenance
FarahZKhan May 22, 2018
3cdeaef
Update job.py
FarahZKhan May 22, 2018
d6c3cf5
add changes from master for conflict resolution
FarahZKhan May 22, 2018
feab3b8
remove duplicate logger arg
FarahZKhan May 22, 2018
d108848
fix some pylint issues
FarahZKhan May 22, 2018
9685afc
add provDocument and provEntity for typing
FarahZKhan May 22, 2018
0ea5b90
conflict resolution with provenance branch
FarahZKhan May 22, 2018
53fc04d
Revert "conflict resolution with provenance branch"
FarahZKhan May 22, 2018
8a1d437
add if conditions to check RO flag
FarahZKhan May 22, 2018
f89d05f
testing
FarahZKhan May 22, 2018
9110d8e
Merge branch 'master' into provenance
tetron May 22, 2018
536fc57
updating types
FarahZKhan May 22, 2018
24a96b9
add param to the static_checker
FarahZKhan May 22, 2018
c3de49d
Merge branch 'master' into provenance
FarahZKhan May 23, 2018
8aa5a37
resolve conflicts
FarahZKhan May 23, 2018
a2df93c
resolve conflicts
FarahZKhan May 23, 2018
383b701
convert tmp_dir_prefix to str to resolve error
FarahZKhan May 23, 2018
4db2079
Update setup.py
FarahZKhan May 23, 2018
7684384
Update requirements.txt
FarahZKhan May 23, 2018
9c625f0
Revert "Revert "conflict resolution with provenance branch""
FarahZKhan May 23, 2018
7d56623
Provenance nested (#768)
mr-c May 23, 2018
d15820d
Merge branch 'master' into provenance
mr-c May 23, 2018
7edde4a
misc fixes
mr-c May 23, 2018
a146089
some formatting cleanups
mr-c May 23, 2018
5ffc743
fail less tests
mr-c May 23, 2018
6011c27
fix execution of sub workflows w/o prov
mr-c May 23, 2018
41e39f4
Merge branch 'master' into provenance
FarahZKhan May 24, 2018
0e2c425
Merge branch 'master' into provenance
FarahZKhan May 25, 2018
941947e
fix mypy issues and move versionstring() to utils
May 25, 2018
3ea49fb
Merge branch 'master' into provenance
FarahZKhan May 28, 2018
47ccd50
Merge branch 'master' into provenance
FarahZKhan May 28, 2018
7daca5c
add example nested workflow from user guide
May 30, 2018
65ba208
Merge branch 'master' into provenance
FarahZKhan Jun 1, 2018
86b1594
Merge branch 'master' into provenance
FarahZKhan Jun 4, 2018
c176a5a
fix test for single tool execution
Jun 4, 2018
f8ea873
Merge branch 'provenance' of https://github.com/common-workflow-langu…
Jun 4, 2018
e121c5c
remove redundant activity for commandline tool execution
Jun 5, 2018
89acdf5
better name for snapshot function
FarahZKhan Jun 5, 2018
d06462e
fix order of arguments for provenance
FarahZKhan Jun 5, 2018
079983c
enable host and user provenance for primary wf profile only
FarahZKhan Jun 5, 2018
90c01c9
fix typing
FarahZKhan Jun 6, 2018
3cc1b67
Merge remote-tracking branch 'origin/master' into provenance
mr-c Jun 16, 2018
74c2af1
types and merge fixups
mr-c Jun 16, 2018
3aff37d
add missing attributes
mr-c Jun 23, 2018
d3a468c
Merge pull request #797 from common-workflow-language/provenance_mrc
mr-c Jun 23, 2018
1287114
Merge branch 'master' into provenance
mr-c Jun 23, 2018
a8cf68e
Remove unwanted import
FarahZKhan Jun 25, 2018
3a4bfa5
fix conflict
FarahZKhan Jun 25, 2018
d6cfb9a
reduce changes to method signatures; provObj→prov_obj
mr-c Jun 28, 2018
2df1775
Fix py2 bug resulting in infinite file writing.
Jun 28, 2018
a5e887e
more cleanups
mr-c Jun 28, 2018
fe3a427
initialize ProcessRunID in the context
FarahZKhan Jun 29, 2018
8a0179c
merge initialization of provObj
FarahZKhan Jun 29, 2018
a8b518f
type fix
mr-c Jun 29, 2018
22a8efe
fix recent merge
FarahZKhan Jun 30, 2018
e1bffed
Merge branch 'provenance' of https://github.com/common-workflow-langu…
FarahZKhan Jun 30, 2018
77a232c
make tool compatible with runtimeContext and loadingContext
FarahZKhan Jul 1, 2018
56ff004
make single commandline tool runnable for RO generation
FarahZKhan Jul 1, 2018
f72dd27
fix types
FarahZKhan Jul 1, 2018
f18dcff
cleanups
mr-c Jun 30, 2018
67679df
check for name
mr-c Jul 1, 2018
3603b50
prov_obj for ExpressionToolJob
mr-c Jul 1, 2018
c55da30
a few more cleanups
mr-c Jul 1, 2018
05e3549
Don't fail on Directory
FarahZKhan Jul 2, 2018
db0902c
remove improper print
FarahZKhan Jul 2, 2018
f2db454
deal with duplicates
mr-c Jul 2, 2018
657d0d7
CallbackJob prov_obj
mr-c Jul 2, 2018
afa8442
fix nproc on OS x
FarahZKhan Jul 2, 2018
6a4b32a
Don't change job order object
FarahZKhan Jul 2, 2018
579f8a4
Semantic Versioning of CWLProv
stain Jul 2, 2018
a918768
fix pylint report
FarahZKhan Jul 2, 2018
389a73c
merge version changes
FarahZKhan Jul 2, 2018
487fea3
fix misplaced typing
FarahZKhan Jul 3, 2018
02753be
solve PermissionError import
FarahZKhan Jul 3, 2018
728bcb8
remove graphviz dep
mr-c Jul 3, 2018
84dcdf0
move prov docs out of README
mr-c Jul 3, 2018
1a68957
cleanups
mr-c Jul 3, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,5 @@ typeshed/2and3/ruamel/yaml

#mypy
.mypy_cache/
bin/
lib/
4 changes: 4 additions & 0 deletions cwltool/argparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,10 @@ def arg_parser(): # type: () -> argparse.ArgumentParser
type=float,
default=20)

parser.add_argument("--provenance",
help="Save provenance to specified folder as a Research Object that capture and aggregate workflow execution and data products.",
type=Text)

exgroup = parser.add_mutually_exclusive_group()
exgroup.add_argument("--print-rdf", action="store_true",
help="Print corresponding RDF graph for workflow and exit")
Expand Down
81 changes: 76 additions & 5 deletions cwltool/executors.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,22 @@
import threading

import os
import copy
import uuid
import datetime
import time
from abc import ABCMeta, abstractmethod

import prov.model as prov
from typing import Dict, Text, Any, Tuple, Set, List


from .builder import Builder
from .errors import WorkflowException
from .mutation import MutationManager
from .job import JobBase
from .process import relocateOutputs, cleanIntermediate, Process
from .process import relocateOutputs, cleanIntermediate, Process, shortname, uniquename, get_overrides
from . import loghandler
from schema_salad.sourceline import SourceLine

_logger = logging.getLogger("cwltool")

Expand All @@ -36,6 +42,7 @@ def output_callback(self, out, processStatus):
def run_jobs(self,
t, # type: Process
job_order_object, # type: Dict[Text, Any]
provDoc,
logger,
**kwargs # type: Any
):
Expand All @@ -44,6 +51,9 @@ def run_jobs(self,
def execute(self, t, # type: Process
job_order_object, # type: Dict[Text, Any]
logger=_logger,
provDoc=None,
engineID=None,
WorkflowID=None,
**kwargs # type: Any
):
# type: (...) -> Tuple[Dict[Text, Any], Text]
Expand All @@ -66,7 +76,7 @@ def execute(self, t, # type: Process
for req in jobReqs:
t.requirements.append(req)

self.run_jobs(t, job_order_object, logger, **kwargs)
self.run_jobs(t, job_order_object, provDoc, engineID, WorkflowID, logger, **kwargs)

if self.final_output and self.final_output[0] and finaloutdir:
self.final_output[0] = relocateOutputs(self.final_output[0], finaloutdir,
Expand All @@ -87,22 +97,83 @@ class SingleJobExecutor(JobExecutor):
def run_jobs(self,
t, # type: Process
job_order_object, # type: Dict[Text, Any]
document,
engineUUID,
WorkflowRunID,
logger,
**kwargs # type: Any
):
reference_locations={}
ProvActivity_dict={}
jobiter = t.job(job_order_object,
self.output_callback,
**kwargs)

try:
ro = kwargs.get("ro")
for r in jobiter:
if r:
builder = kwargs.get("builder", None) # type: Builder

if builder is not None:
r.builder = builder
if r.outdir:
self.output_dirs.add(r.outdir)
r.run(**kwargs)
if ro:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest creating a separate subclass of JobExecutor here to keep the API simpler and cleaner since the number of changes and if clauses for a single feature is large enough.

#here we are recording provenance of each subprocess of the workflow
if ".cwl" in getattr(r, "name"): #for prospective provenance
steps=[]
for s in r.steps:
stepname="wf:main/"+str(s.name)[5:]
steps.append(stepname)
print("step name is: ", stepname)
document.entity(stepname, {prov.PROV_TYPE: "wfdesc:Process", "prov:type": "prov:Plan"})
#create prospective provenance recording for the workflow
document.entity("wf:main", {prov.PROV_TYPE: "wfdesc:Process", "prov:type": "prov:Plan", "wfdesc:hasSubProcess=":str(steps), "prov:label":"Prospective provenance"})
customised_job={} #new job object for RO
for e, i in enumerate(r.tool["inputs"]):
with SourceLine(r.tool["inputs"], e, WorkflowException, _logger.isEnabledFor(logging.DEBUG)):
iid = shortname(i["id"])
if iid in job_order_object:
customised_job[iid]= copy.deepcopy(job_order_object[iid]) #add the input element in dictionary for provenance
elif "default" in i:
customised_job[iid]= copy.deepcopy(i["default"]) #add the defualt elements in the dictionary for provenance
else:
raise WorkflowException(
u"Input '%s' not in input object and does not have a default value." % (i["id"]))
##create master-job.json and returns a dictionary with workflow level identifiers as keys and locations or actual values of the attributes as values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name the file primary-job.json and update everywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

relativised_input_object=ro.create_job(customised_job, kwargs) #call the method to generate a file with customised job
for key, value in relativised_input_object.items():
strvalue=str(value)
if "data" in strvalue:
shahash="data:"+value.split("/")[-1]
rel_path=value[3:]
reference_locations[job_order_object[key]["location"]]=relativised_input_object[key][11:]
document.entity(shahash, {prov.PROV_TYPE:"wfprov:Artifact"})
#document.specializationOf(rel_path, shahash) NOTE:THIS NEEDS FIXING as it required both params as entities.
else:
ArtefactValue="data:"+strvalue
document.entity(ArtefactValue, {prov.PROV_TYPE:"wfprov:Artifact"})
if ".cwl" not in getattr(r, "name"):
if ro:
ProcessRunID="run:"+str(uuid.uuid4())
#each subprocess is defined as an activity()
provLabel="Run of workflow/packed.cwl#main/"+str(r.name)
ProcessProvActivity = document.activity(ProcessRunID, None, None, {prov.PROV_TYPE: "wfprov:ProcessRun", "prov:label": provLabel})
if hasattr(r, 'name') and ".cwl" not in getattr(r, "name"):
document.wasAssociatedWith(ProcessRunID, engineUUID, str("wf:main/"+r.name))
document.wasStartedBy(ProcessRunID, None, WorkflowRunID, datetime.datetime.now(), None, None)
#this is where you run each step. so start and end time for the step
r.run(document, WorkflowRunID, ProcessProvActivity, reference_locations, **kwargs)
else:
r.run(**kwargs)
#capture workflow level outputs in the prov doc
if ro:
for eachOutput in self.final_output:
for key, value in eachOutput.items():
outputProvRole="wf:main"+"/"+str(key)
output_checksum="data:"+str(value["checksum"][5:])
document.entity(output_checksum, {prov.PROV_TYPE:"wfprov:Artifact"})
document.wasGeneratedBy(output_checksum, WorkflowRunID, datetime.datetime.now(), None, {"prov:role":outputProvRole })
else:
logger.error("Workflow cannot make any more progress.")
break
Expand Down
58 changes: 45 additions & 13 deletions cwltool/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,20 @@
import subprocess
import sys
import tempfile
import prov.model as prov
from abc import ABCMeta, abstractmethod
from io import open
from threading import Lock

import shellescape

import time
import datetime
from .utils import copytree_with_merge, docker_windows_path_adjust, onWindows
from typing import (IO, Any, Callable, Dict, Iterable, List, MutableMapping, Text,
Union, cast)


from .builder import Builder
from .errors import WorkflowException
from .pathmapper import PathMapper
Expand Down Expand Up @@ -170,11 +176,10 @@ def _setup(self, kwargs): # type: (Dict) -> None
_logger.debug(u"[job %s] initial work dir %s", self.name,
json.dumps({p: self.generatemapper.mapper(p) for p in self.generatemapper.files()}, indent=4))

def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):
def _execute(self, runtime, env, kwargs, document=None, WorkflowRunID=None, ProcessProvActivity=None,reference_locations=None, rm_tmpdir=True, move_outputs="move"):
# type: (List[Text], MutableMapping[Text, Text], bool, Text) -> None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update the type signature to match the changes you made

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


ro = kwargs.get("ro")
scr, _ = get_feature(self, "ShellCommandRequirement")

shouldquote = None # type: Callable[[Any], Any]
if scr:
shouldquote = lambda x: False
Expand All @@ -189,7 +194,19 @@ def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):
u' < %s' % self.stdin if self.stdin else '',
u' > %s' % os.path.join(self.outdir, self.stdout) if self.stdout else '',
u' 2> %s' % os.path.join(self.outdir, self.stderr) if self.stderr else '')

if hasattr(self, "joborder"):
for key, value in getattr(self, "joborder").items():
if ro:
provRole=self.name+"/"+str(key)
ProcessRunID=str(ProcessProvActivity._identifier)
if 'location' in str(value):
location=str(value['location'])
if location in reference_locations: #workflow level inputs referenced as hash in prov document
document.used(ProcessRunID, "data:"+str(reference_locations[location]), datetime.datetime.now(), None, {"prov:role":provRole })
else: #add checksum created by cwltool of the intermediate data products. NOTE: will only work if --compute-checksums is enabled.
document.used(ProcessRunID, "data:"+str(value['checksum'][5:]), datetime.datetime.now(),None, {"prov:role":provRole })
else: #add the actual data value in the prov document
document.used(ProcessRunID, "data:"+str(value), datetime.datetime.now(),None, {"prov:role":provRole })
outputs = {} # type: Dict[Text,Text]

try:
Expand All @@ -214,6 +231,7 @@ def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):
stdout_path = absout

commands = [Text(x) for x in (runtime + self.command_line)]

job_script_contents = None # type: Text
builder = getattr(self, "builder", None) # type: Builder
if builder is not None:
Expand All @@ -227,7 +245,6 @@ def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):
cwd=self.outdir,
job_script_contents=job_script_contents,
)

if self.successCodes and rcode in self.successCodes:
processStatus = "success"
elif self.temporaryFailCodes and rcode in self.temporaryFailCodes:
Expand All @@ -244,6 +261,15 @@ def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):

outputs = self.collect_outputs(self.outdir)
outputs = bytes2str_in_dicts(outputs) # type: ignore
#creating entities for the outputs produced by each step (in the provenance document) and associating them with
#the ProcessRunID
if ro:
for key, value in outputs.items():
StepOutput_checksum="data:"+str(value["checksum"][5:])
document.entity(StepOutput_checksum, {prov.PROV_TYPE:"wfprov:SubProcessArtifact"})
stepProv="wf:main"+"/"+str(self.name)+"/"+str(key)
ProcessRunID=str(ProcessProvActivity._identifier)
document.wasGeneratedBy(StepOutput_checksum, ProcessRunID, datetime.datetime.now(), None, {"prov:role":stepProv})

except OSError as e:
if e.errno == 2:
Expand All @@ -263,8 +289,12 @@ def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):

if processStatus != "success":
_logger.warning(u"[job %s] completed %s", self.name, processStatus)
if ro:
document.wasEndedBy(str(ProcessProvActivity._identifier), None, WorkflowRunID, datetime.datetime.now())
else:
_logger.info(u"[job %s] completed %s", self.name, processStatus)
if ro:
document.wasEndedBy(str(ProcessProvActivity._identifier), None, WorkflowRunID, datetime.datetime.now())

if _logger.isEnabledFor(logging.DEBUG):
_logger.debug(u"[job %s] %s", self.name, json.dumps(outputs, indent=4))
Expand All @@ -283,8 +313,8 @@ def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):

class CommandLineJob(JobBase):

def run(self, pull_image=True, rm_container=True,
rm_tmpdir=True, move_outputs="move", **kwargs):
def run(self, document=None, WorkflowRunID=None, ProcessProvActivity=None,reference_locations=None, pull_image=True, rm_container=True,
rm_tmpdir=True, move_outputs="move", **kwargs):
# type: (bool, bool, bool, Text, **Any) -> None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise for this type signature as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


self._setup(kwargs)
Expand Down Expand Up @@ -312,7 +342,7 @@ def run(self, pull_image=True, rm_container=True,
stageFiles(self.generatemapper, ignoreWritable=self.inplace_update, symLink=True)
relink_initialworkdir(self.generatemapper, self.outdir, self.builder.outdir, inplace_update=self.inplace_update)

self._execute([], env, rm_tmpdir=rm_tmpdir, move_outputs=move_outputs)
self._execute([], env, kwargs, document, WorkflowRunID, ProcessProvActivity,reference_locations, rm_tmpdir=rm_tmpdir, move_outputs=move_outputs)


class ContainerCommandLineJob(JobBase):
Expand All @@ -323,17 +353,18 @@ def get_from_requirements(self, r, req, pull_image, dry_run=False):
# type: (Dict[Text, Text], bool, bool, bool) -> Text
pass


# type: (bool, bool, bool, Text, **Any) -> None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the line above needs to be removed, or put where it belongs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@abstractmethod
def create_runtime(self, env, rm_container, record_container_id, cidfile_dir,
cidfile_prefix, **kwargs):
# type: (MutableMapping[Text, Text], bool, bool, Text, Text, **Any) -> List
pass

def run(self, pull_image=True, rm_container=True,
def run(self, document=None, WorkflowRunID=None, ProcessProvActivity=None,
reference_locations=None, pull_image=True, rm_container=True,
record_container_id=False, cidfile_dir="",
cidfile_prefix="",
rm_tmpdir=True, move_outputs="move", **kwargs):
# type: (bool, bool, bool, Text, Text, bool, Text, **Any) -> None
cidfile_prefix="", rm_tmpdir=True, move_outputs="move", **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to the type annotations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added back


(docker_req, docker_is_req) = get_feature(self, "DockerRequirement")

Expand Down Expand Up @@ -382,7 +413,7 @@ def run(self, pull_image=True, rm_container=True,
runtime = self.create_runtime(env, rm_container, record_container_id, cidfile_dir, cidfile_prefix, **kwargs)
runtime.append(img_id)

self._execute(runtime, env, rm_tmpdir=rm_tmpdir, move_outputs=move_outputs)
self._execute(runtime, env, kwargs, document, WorkflowRunID, ProcessProvActivity, reference_locations, rm_tmpdir=rm_tmpdir, move_outputs=move_outputs) #included kwargs to see if the workflow has been executed using the provenance flag.


def _job_popen(
Expand Down Expand Up @@ -461,6 +492,7 @@ def _job_popen(
stderr_path=stderr_path,
stdin_path=stdin_path,
)

with open(os.path.join(job_dir, "job.json"), "wb") as f:
json.dump(job_description, codecs.getwriter('utf-8')(f), ensure_ascii=False) # type: ignore
try:
Expand Down
Empty file.
Loading