[Provenance] Store the execution metadata, including command line, when running a job #84

hmenager · 2015-07-09T12:12:43Z

Maybe also include the generated files.

ghost · 2015-07-14T10:17:19Z

Is this for reference implementation, or do you think the spec should cover how to do this consistently?

If latter, perhaps we could just specify a few properties to extend wfprov. Some regular metadata is already handled by PROV-O, but might be missing stuff like command line and environment setup used. It might be worth adding these properties elsewhere in generic form (EDAM?).

mr-c · 2015-10-26T15:04:42Z

+1 for the reference implementation to do so, moved to common-workflow-language/cwltool#9

For the spec, I agree it that it would be useful to expose the metadata to the tools and require a specific report available to the user.

hmenager · 2015-10-26T21:48:10Z

+1 on moving this to ref implementation ;)

mr-c · 2016-03-11T10:43:05Z

From @stain in chat

I would like to work on how we can have a somewhat common view of provenance from a CWL worflow run - we can use what we've done in Taverna https://github.com/apache/incubator-taverna-engine/tree/master/taverna-prov#structure-of-exported-provenanceand https://w3id.org/ro/2016-01-28/wfprov as a starting point.
I think something like our RO Bundle is needed so you can gather all the files made and used, and include copies of the CWL workflow and tool descriptions at the time of running.
[...]
in tavernaprov we added some more technical things that might not be needed in CWL as they could be engine specific
like we included a way to show a snippet of the value within the graph
but obviously you don't want to do that with larger values

ghost · 2016-03-11T11:13:27Z

Also from @stain:

perhaps come up with a decent JSON-LD profile for wfprov

+1.

ghost · 2016-03-12T11:14:03Z

Playing around with structure of run snapshots. Starting with the simplest workflow:

# wf.yaml
class: Workflow
inputs:
  - id: i1
outputs:
  - id: o1
    source: s1.o1
steps:
  - id: s1
    run: tool.yaml
    inputs:
      - id: s1.i1
        source: i1

This one just has a single step - a tool that e.g. increments its input integer.
After running (i1=1), we can get a wfprov:WorkflowRun that might be represented in JSON-LD like:

class: WorkflowRun                          # type
id: "#run1"
process: wf.yaml                            # describedByProcess
enactedBy: http://example.org/engine
inputs:                                     # reverse of usedInput
  - {port: wf.yaml#i1, value: 1}            # port = describedByParameter
outputs:                                    # reverse of wasOutputFrom
  - {port: wf.yaml#o1, value: 2}
runs:                                       # reverse of wasPartOfWorkflowRun
  - class: ProcessRun
    id: "#run1.s1"
    process: wf.yaml#s1                     # Step, not tool.
    enactedBy: http://example.org/engine
    inputs:
      - {port: wf.yaml#s1.i1, value: 1}
    outputs:
      - {port: wf.yaml#s1.o1, value: 2}

Not quite sure how to represent wfprov:Artifacts that are just integers, so here they are objects with value property.
Also not sure if this is even a valid wfprov:WorkflowRun.

The trickiness comes when we nest this workflow in an outer workflow and scatter it:

# wf-outer.yaml
class: Workflow
outputs:
  - id: o1
    source: s1.o1
steps:
  - id: s1
    run: wf.yaml
    scatter: s1.i1
    inputs:
      - id: s1.i1
        source: i1

Suppose we run this with [1, 2] as input.
Should do a scattered increment and get [2, 3] as result.
Without any RDF restrictions, I would encode the runs/jobs like this:

- class: WorkflowJob
  id: run1
  app: wf-outer.yaml
  parent: null
  inputs: {i1: [1, 2]}
  outputs: {o1: [2, 3]}
- class: ScatterJob
  id: run1.s1
  app: null
  parent: run1
  inputs: {i1: [1, 2]}
  outputs: {o1: [2, 3]}
- class: WorkflowJob
  id: run1.s1.1
  app: wf.yaml
  parent: run1.s1
  inputs: {i1: 1}
  outputs: {o1: 2}
- class: WorkflowJob
  id: run1.s1.2
  app: wf.yaml
  parent: run1.s1
  inputs: {i1: 2}
  outputs: {o1: 3}
- class: CommandLineJob
  id: run1.s1.1.s1
  app: tool.yaml
  parent: run1.s1.1
  inputs: {i1: 1}
  outputs: {o1: 2}
- class: CommandLineJob
  id: run1.s1.2.s1
  app: tool.yaml
  parent: run1.s1.2
  inputs: {i1: 2}
  outputs: {o1: 3}

There's a root job (run1), a ScatterJob which is a container for iterations,
two nested WorkflowJobs, each with own CommandLineJob.

Job identifiers are built semantically: <parent_id>.<step_id> or <parent_id>.<iteration_index>.
This should work since CWL local identifiers should match the [A-Za-z][A-Za-z0-9]* pattern.
An improvement would be to have IRIs (e.g. http://example.org/jobs/run1.s1).
It's only a nicety - IDs can be anything (hopefully URLs though).
Likewise, app references should be URLs.

Putting this into wfprov is tricky. Best I could come up is:

class: WorkflowRun
id: "#run1"
process: wf-outer.yaml
inputs:
  - {port: wf-outer.yaml#i1, valueJson: "[1, 2]"}
outputs:
  - {port: wf-outer.yaml#o1, valueJson: "[2, 3]"}
runs:
  - class: IterationRun
    id: "#run1.s1"
    process: wf-outer.yaml#s1
    inputs:
      - {port: wf-outer.yaml#s1.i1, valueJson: "[1, 2]"}
    outputs:
      - {port: wf-outer.yaml#s1.o1, valueJson: "[2, 3]"}
    runs:
      - class: WorkflowRun
        id: "#run1.s1.1"
        process: wf-outer.yaml#s1
        inputs: [{port: wf.yaml#i1, value: 1}]
        outputs: [{port: wf.yaml#o1, value: 2}]
        runs:
          - class: ProcessRun
            id: "#run1.s1.1.s1"
            process: wf.yaml#s1
            inputs: [{port: wf.yaml#s1.i1, value: 1}]
            outputs: [{port: wf.yaml#s1.o1, value: 2}]
      - class: WorkflowRun
        id: "#run1.s1.2"
        process: wf-outer.yaml#s1
        inputs: [{port: wf.yaml#i1, value: 2}]
        outputs: [{port: wf.yaml#o1, value: 3}]
        runs:
          - class: ProcessRun
            id: "#run1.s1.2.s1"
            process: wf.yaml#s1
            inputs: [{port: wf.yaml#s1.i1, value: 2}]
            outputs: [{port: wf.yaml#s1.o1, value: 3}]

Few issues here:

Needed to encode the arrays in a valueJson property. This is a known issue with CWL though (encoding non-primitive values in RDF).
IterationRun (not part of wfprov) is a specialization of a ProcessRun, similar to the previously mentioned ScatterJob. Not sure how I would encode the iterations without it.
The wfprov:wasPartOfWorkflowRun property doesn't quite fit with IterationRuns.
For the nested WorkflowRuns, there's an inconsistency with port properties.
If this was a non-scattered workflow component, those would point to inputs and outputs of the step in the outer workflow
(in the previous example ProcessRuns had e.g. {port: wf.yaml#s1.i1, value: 1} rather than {port: tool.yaml#i1, value: 1}).

I'm still confused about wfdesc:Process and how it maps to CWL.
It seems to map to cwl:Step nicely, but it also maps to cwl:Workflow and cwl:CommandLineTool.
Perhaps if I found an example wfdesc:Workflow that has two instances of the same workflow as components the confusion would be cleared :)

stain · 2016-03-15T19:43:14Z

Great draft, @ntijanic!

It's OK that numbers are also wfprov:Artifact - they don't have to be files. I guess just value (rdf:value?) would make sense. In Taverna's workflow traces there's also UUID-like identifiers on the outputs so they can be cited elsewhere - but I don't think we need to require this. (This would make it easier to see that a workflow output value was the same ITEM as the one at a process run output, not just accidentally the same bytes).

We have to be careful about how we reference the workflow steps so we get the identifiers straight, so how you do process: wf.yaml#s1 should be correct - each ProcessRun is an enactment of the step defined in the workflow (wfdesc:Process).

I guess some confusion comes with the term.. in early wfdesc drafts we used to call all the wfdesc terms *Description to say that they are the plans for execution, so a wfdesc:Workflow is a workflow plan/description, and equivalent for wfdesc:Process. (It was dropped as the namespace already says desc :)

I'll go through the scatter considerations in a separate comment.

stain · 2016-03-15T20:10:16Z

First of all, perhaps it's useful to look at this recent release candidate for updating the wfdesc and wfprov ontologies. They should be made live some time later this month when we have finished reviewing them - feel free to help!)

cwl:CommandLineTool should probably not be a wfdesc:Process as it can be used in several steps, but a wfdesc:ProcessImplementation, referred to with wfdesc:hasImplementation. (this is equivalent to Taverna's Activity concept)

For a wfdesc:Workflow sometimes the equivalent wfdesc:WorkflowDefinition is the same (e.g. the top level Workflow in your example) - but in theory (and in practice for Taverna) the Workflow as a structure can have a common (UUID) identifier no matter where it is saved or included - hence the possibility to keep those distinct. I'm OK with CWL using the same identifier here as it's not a file-format where it's necessarily easy to remember to update such a UUID (as we do in Taverna's GUI).

So the special case is when a workflow is used as a nested workflow twice - for CWL I think the easiest is to make the identifier for each the inner Workflow instances to match the outer cwl:Step - and then both can have wfdesc:hasWorkflowDefinition to the same file. That would mean a bit of trouble in the provenance of the inner workflow runs as there would be two runs with the same parameter names appearing.

If a cwl:Step is a nested workflow, then the equivalent wfdesc:ProcessRuns gets 'upgraded' to wfdesc:WorkflowRuns which would have inner ProcessRuns linked to it with wfprov:wasPartOfWorkflowRun. So you can climb back out from that to find which process run (and in your model then eventually also which IterationRun) that particular workflow run was part of.

But on the other hand, if you wanted to know "all values seen at a given port" across all scatter runs, this would still be easy as they would all be having the same describedByParameter link.

stain · 2016-03-15T20:14:45Z

I quite like the IterationRun idea! I see how it messes up the ports as it doesn't have its own ports (actually internally in Taverna's engine the iteration strategy DO have its own ports, but we don't assign those identifiers in the provenance) - but I don't see a big problem with saying this is an intermediate step as long as you have distinct identifiers for each inner run that it wraps.

So I would just use the same port parameters on both - it's not very different from the issue with nested workflow ports - where a workflow input port is both an outer receiver port and inner sender port.

stain · 2016-03-16T00:56:28Z

Here's a bit of motivation for wfprov based on https://gist.github.com/stain/fa63a3527bd09be2a42d

Here a particular run 6a6ffc3d of processor join_cd used two values at ports first and second:

<run/process/6a6ffc3d-47cd-48c4-9a6e-8451481f15e3/>     a prov:Activity;
     prov:qualifiedAssociation  [ 
           prov:agent <workflowrun.prov.ttl#taverna-engine>; 
          # which step was run
           prov:hadPlan <wf/workflow/Hello_abcd/processor/join_cd/> ];

     prov:qualifiedUsage  
          [  # data 293661a3 used as input 'first'
            prov:entity <data/ref/293661a3-3e4e-4ff2-aff8-f8513a19a04a>;
            prov:hadRole <wf/workflow/Hello_abcd/processor/join_cd/in/first> ],
          [ # data 50d03488 used as input 'second'
            prov:entity <data/ref/50d03488-50e2-4a5b-9dc6-28579b24fcef>;
            prov:hadRole <wf/workflow/Hello_abcd/processor/join_cd/in/second> ];

So yes, it's clear here which value was used in which port, but it's a bit elaborate as we have to invoke prov:hadRole to qualify the prov:used relation.

wfprov is more descriptive with shortcuts like describedByProcess and usedInput:

<run/process/6a6ffc3d-47cd-48c4-9a6e-8451481f15e3/>     a wfprov:WorkflowRun;
    wfprov:describedByProcess <wf/workflow/Hello_abcd/processor/join_cd/>;
    wfprov:usedInput <data/ref/293661a3-3e4e-4ff2-aff8-f8513a19a04a>,
            <data/ref/50d03488-50e2-4a5b-9dc6-28579b24fcef>;

However if you care about at which input port, you would have to combine the wfdesc Workflow listing of its hasInput:

<wf/workflow/Hello_abcd/processor/join_cd/> a wfdesc:Workflow ;
      rdfs:label "join_cd" ;
    wfdesc:hasInput <processor/join_cd/in/first> , <processor/join_cd/in/second> ;
    wfdesc:hasOutput <processor/join_cd/out/joint>  .

and the ports the artifact appeared at:

<data/ref/50d03488-50e2-4a5b-9dc6-28579b24fcef>     a wfprov:Artifact ;
  wfprov:describedByParameter <wf/workflow/Hello_abcd/processor/join_cd/in/second>,
            <wf/workflow/join_ab/processor/concatenate/in/string2> .

stain · 2016-03-16T02:00:14Z

See https://gist.github.com/stain/fa63a3527bd09be2a42d#file-helloabcd-wfprov-ttl for an abbreviated example of wfprov from Taverna of a workflow that runs the same inserted nested workflow twice, iterating over it twice, which should mean 4 distinct calls to the inner "concatenate" process.

I've shortened the URLs and removed the "duplicate" statements in pure PROV - see the rest of the gist for the gory details - or download the run bundle ZIP for the actual provenance output.

I don't think the CWL output need to enforce a distinction between the data 'item' and its file with tavernaprov:content as we did in Taverna - but one reason we did it is that it would be wrong to claim that the serialized file was passed along in the workflow when the value only existed in memory as a data structure. (In Taverna provenance, those intermediate value files are saved out afterwards).

stain · 2016-03-16T02:05:36Z

I did a quick translate of the helloabcd wfprov as JSON-LD.

Can be shortened further!

ghost · 2016-03-16T12:06:58Z

Wow. Thanks @stain, this clarifies a lot for me.

I'll need time to go over it in detail; some questions/comments meanwhile:

But on the other hand, if you wanted to know "all values seen at a given port" across all scatter runs, this would still be easy as they would all be having the same describedByParameter link.

Ah, so describedByParameter, usedInput and wasOutputFrom don't necessarily mean that it was the exact value on that port - it could mean they were part of value on that port? Similarly, data/list/Hello-ab is a list and references its members through hadMember. But where do you encode the actual list value (preserving the ordering of list items)?

I don't think the CWL output need to enforce a distinction between the data 'item' and its file

Similar issue to above, since CWL's artifacts can be arbitrary dicts-and-lists trees which can contain files as leaves. Perhaps we can somehow store the tree itself, plus a flat set of all contained files/artifacts through hadMember.

I really like the idea of wrapping everything in an Artifact to signify that it's not just any 42, it's the same 42 that was the output from previous step.

stain · 2016-03-16T14:14:12Z

No, we used describedByParameter to say it was the actual final value at the port - in earlier approaches we had "seen at port (possibly part of a list)" - but this made it very difficult to untangle the list and its members when trying to show the data afterwards as you would have to query for the "highest level list that doesn't directly or indirectly contain the others".

(or as happened at one point - we didn't track the lists at all, which meant downstream consumers of the list seem to get the list out of thin air)

But if you have two different parameters, then it would not longer be tangled - so you would then come back to the issue of minting separate identifiers for the inner and outer ports of the IterationRun.

So instead we ended up declaring the list members using the cumbersome prov:Dictionary approach:

<http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/list/71f5d67b-195d-49bb-94c0-1b800d18f72a/false/1>     
    a wfprov:Artifact, prov:Collection, prov:Dictionary,prov:Entity;
    # The individual values are **not** described by this parameter
     wfprov:describedByParameter <http://ns.taverna.org.uk/2010/workflowBundle/f5fddb99-9677-43cb-aac5-4d14e1ad1b46/workflow/Hello_abcd/processor/a_b/out/list>;
     wfprov:wasOutputFrom <http://ns.taverna.org.uk/2011/run/60bce245-9f15-4391-902f-134bc699c583/process/6680bce1-8b12-411f-841c-84df7b25cefb/>;
# Short form - no order:
     prov:hadMember <http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/ref/0f086ff4-c030-42c3-ab05-ca5fca72e432>,
            <http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/ref/32eba706-636a-47c3-9607-a7712d7a0a49>;
# Longer form - index as an `xsd:long`
     prov:hadDictionaryMember
        [ a prov:KeyEntityPair;
           prov:pairEntity <http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/ref/0f086ff4-c030-42c3-ab05-ca5fca72e432>;
           prov:pairKey "1"^^xsd:long ],
        [ a prov:KeyEntityPair;
           prov:pairEntity <http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/ref/32eba706-636a-47c3-9607-a7712d7a0a49>;
           prov:pairKey "0"^^xsd:long ];

A list of lists (depth 2) would be done by another such prov:Dictionary listed as the prov:pairEntity - which become quite deep.

With JSON-LD you can use native JSON lists and make them into a proper RDF List - however in PROV-O there was resistance against RDF lists as they could be tricky to support in the other PROV serializations. So to do that you would need a new property - perhaps it would be appropriate to include in wfdesc as I don't like the above.

But there could be an argument for allowing describedByParameter also on the list items, such as when there is pipelining/item streaming and values are delivered before the final list - then the provenance could look inconsistent with a list being generated after its values have been consumed downstream.

upgrade to mypy 0.470

27c20e8 Corrects initializer to FileCache when no TMP or HOME (#159) 66457f2 unlike cwltest, only one wheel here (#162) a343bc8 main: print string instead of bytes (#146) d42ac08 Travis fix for forks (#161) f74a314 Fix date --utc. Update version for next release. Update classifiers. (#160) 6ce2b9a Default field (#150) 8e520ec correct path due to schema_salad vs. schema-salad (#154) c359d07 Upgrade release process (#153) 5d75f90 Validation shouldn't warn about foreign properties if skip_schemas is True. (#152) 9b234bb Make link checking $schemas section and extension fields non-fatal. (#145) 8752f87 Make exceptions loading extension schemas non-fatal. (#144) 8ae7664 Unify the error message format for invalid YAML (#143) fc00a2c Add space between filename and message for `--print-oneline` (#142) 91e1b41 Fix error message for resolve_ref with --print-oneline (#141) 3f829b1 Better itemize handling for --print-oneline (#136) 7b1d94f include schema_salad/tests/docimp in the dist (#140) 2d72415 Fix for pylint (#139) 6e9abce Fix for invalid YAML with `--print-oneline` (#137) a9db248 Merge pull request #138 from tom-tan/add-test-for-print-oneline 25fd531 Add test for Issue #135 c29eac3 Python schema loader codegen (WIP) (#134) d654f83 Merge pull request #135 from tom-tan/print-oneline 52fd416 Fix for pep8 d6e7298 Fix for pylint cded5f4 Add --print-oneline option c7f3140 Refactor Windows relative path support (#131) 2d94392 Add include_traceback option to SourceLine. (#133) 70519d9 Merge pull request #128 from common-workflow-language/bump_avro 2543f2c Bump version of `avro` for Python 3 9554e5f Merge pull request #127 from common-workflow-language/py3-win-workaround f637d60 add note about workaround eac115d Merge pull request #125 from common-workflow-language/tmp_fix_windows 25990c5 __init__.py: Use logger to raise error on Windows rather than print 36c04d1 utils.py: Fix missing import ce49d5d add error message when not able to create \tmp folder in Windows a81a35f schema_salad/__init__.py: Create '\tmp' folder on windows if not present f61676e utils.py: add onWindows() function 5e49770 Merge pull request #123 from common-workflow-language/modernize 1df0a01 Merge branch 'master' into modernize f1aea74 requirements.txt: add future module dep 5cd2f36 ignore type checking past module 3ad9a21 remove --warn-unused-ignores flag since past is giving error in py3 but not py2 a6cedd3 move avro typshed to 2and3 and use Text instead of unicode 6450eee delete typesheds/3/avro-python3 193ad1e fix avro requirements related issue 627f1d4 fix linting errors 7bf4b32 setup.py: for python3 use forked avro pip package b5b9595 setup.py: dependency link hack, but need to pass argv to pip explicitly 043bb3a schema.py: remove redundant lines while making avro lib import 626208d __init__.py: call autotranslate on avro py2 if running in python3 f619227 Merge pull request #122 from common-workflow-language/modernize de7719e use io.open instead of builtin open Leads to more consistant behaviour across py2 and py3 also, specify encoding in the open command rather than reading raw bytes and manually decoding 58f1daa Merge pull request #121 from common-workflow-language/modernize 540f0fd fix: use 'rb' instead of 'wb' since we're reading the file a46401a Merge remote-tracking branch 'origin/master' into modernize ecf4c3f ref_resolver, makedoc: explicitly open files in binary mode d47200e mypy_requirements: bump mypy version to 0.520 eef27fa Merge pull request #120 from kapilkd13/windows-fix eb3f5e2 Merge branch 'master' into windows-fix 4b4c4d6 modified urljoin: on windows it was ignoring Drive names when constructing final url 3cba9dd Merge pull request #110 from kapilkd13/windows c51b0e1 re-add six import f91d588 Merge branch 'master' into windows 517617c Merge pull request #119 from common-workflow-language/tweak-build dbe4c81 tweak the build 7ea7dcf Merge pull request #118 from common-workflow-language/manu-chroma-patch-1 6c19085 py3 compatibility: modified url2pathname acoordingly b54bd29 py3 compatibility: modified urlparse acoordingly bd75357 modifying urljoin for same url and baseurl, moving windows path conversion to after all resolution point 7b614ea modified test mixin for correct base path 5188caa added windows build badge 1be2b8f added unix build badge 45b1372 windows compatibility issue: windows file paths were not fetched b54ceea windows compatibility issue: windows Drive names are considered as scheme 3834118 windows compatibility issue: after path split windows path contain / before drive name 89c48b2 test_examples: normalizing path and correcting file scheme for windows 46afdf7 normalizing path depending on OS f047c04 test_fetch: correcting file scheme for windows 837e3e1 added appveyor.yml for windows testing and Integration bde8dfb pin upper version of ruamel.yaml package 4d1c3ff Merge pull request #117 from common-workflow-language/better-bump a8c3fde oops b3c579f a better bump of the version (mea culpa) fa93acc Merge pull request #109 from manu-chroma/modernize 52c054e 3.6 too! a8903de setup.py: change version, uncomment classifiers for py3 d9bdc8b setup.py: cleanup 7748526 schema.py: fix some types, remove redundant str b2a5a26 bump ruamel version, put upper bound on ruamel in setup.py 3d60033 mypy_requirements.txt: bump mypy version a1f697b ref_resolver.py: mypy: use Text instead of unicode 669ebfc remove redundant condition with typing package 6b3d068 typeshed/3: add stubs(autogen) for avro-python3 8672a5c sourceline.py: replace AnyStr with Text type 740e235 mypy: turn off warn for unused ignore in --py3 4f0ad4f tox: separate out mypy targets for py2 and py3 d16c0e5 move cachecontrol stub files to typshed/2and3/ ad05e61 minor changes in rdflib stub files to make them compatible with mypy in py3 mode and move to typshed/2and3/ directory 65e583c mypy: some improvements in type annotations f051f49 typesheds/2.7: remove stub files which already come with mypy==0.511 1d11190 Makefile: add target for mypy without --py2 flag c6c88f1 add mypy.ini and ignore all ruamel.yaml package errors 2850043 makedoc.py: pass unicode string mintune markdown function 963ea3f mypy: add typeshed/2and3 in mypypath 62cbece add stub file for mistune package author = "Aleksandr Slepchenkov" email = "Sl.aleksandr28@gmail.com" 4958ebf use Text type where ever Union[str, unicode] is expected d16a308 mypy: remove redundant ignore type annotations 090c1ba mypy: use latest stable version of mypy - use update ruamel.yaml package which includes more mypy annotations 67d7481 use same import pattern for six.urllib accross the codebase - also, re-ogranise six lib imports 150730c py3 fix: use six StringIO import 0773e44 fix urlparse using six in tests and schema_salad core; no automated fixers 6a11b16 mypy type annotations fix: replace unicode -> Text 8b6f9b3 Apply Python3 modernize transforms 1eb19b6 use typing>=3.5.3 b0b1f14 disable git version tagging to avoid testing conflicts with cwltool ff000db setup.py: add six as explicit dependency c66a3f9 tests: remove redundant parentheses in print statements b1605e3 fix: use six integer_types and string_types in place of int and basestring 29dcb7f tox.ini: fix bugs, enable flake8 linting for all python versions 5efed90 tox.ini: re-write file, include unit tests for all supported versions of python - enable more python3 versions on travis for unit testing 508c8d3 setup.py: bump version to '3.0' 9874d11 use same import pattern for six.urllib accross the codebase - also, re-ogranise six lib imports d23c804 tests: use assertEqual() inplace assertEquals() 3095597 fix: use six.integer_types to handle int, long incompatibles 965378f .tox.ini: turn on python3 unit testing 7ee4848 .gitignore: add more local folders 4d3c6de fix: use list of dict keys to iterate, fix flake8 warnings - py3 compatibility 916aaa1 py3: use six.moves.urllib instead of urllib 9108c7b fix: minor: py3 compatibility wrt avro cec0213 setup.py: install different version of avro in case of py3 runtime 3704c98 py3 fix: use six StringIO import e795ac9 fix: make regex command python3 compatible b4273f2 fix urlparse using six in tests and schema_salad core; no automated fixers df685ea modernize tests e9bb3ec mypy type annotations fix: replace unicode -> Text 3080c69 Apply Python3 modernize transforms caec629 Merge pull request #116 from manu-chroma/master 5a76bb6 create mypy_requirements.txt for mypy related deps ba88f5e add mypy, typed-ast dependency in requirements.txt b61311c Makefile: pin mypy, typed-ast version for jenkins make target 9d9317e Merge pull request #114 from manu-chroma/master 8372fe2 Merge branch 'master' into master d370602 Merge pull request #115 from common-workflow-language/skip_schemas e39018a Optional skip_schemas 6a582c3 refactor flatten function. move to utils.py 3bba34e minor refactor: create utils.py 5b3863a Merge pull request #112 from common-workflow-language/tweak-cov ffc3367 exclude tests from coverage report 5b0b019 Validation fix. Must raise exception when raise_ex is true and record class (#107) 424ad78 Fix for self-colliding ids on $import (#102) 8a6eaff Validating Ids for duplicate Issue#56 (#98) a560ef3 Pass through logger to capture warnings. (#101) ca9218e Merge pull request #99 from manu-chroma/master a63f75f schema_salad/tests: add tests for cli args testing: --version, empty args - fix return condition in main() 04c6784 schema_salad/main.py: --version now correctly prints version 61c2203 Merge pull request #100 from manu-chroma/test_patch a082ab5 schema_salad/tests: use print() in tests 0114153 In vocab flag (#97) 39516e5 Add map and typedsl resolution documentation. (#96) 0b76655 Merge pull request #92 from kapilkd13/master c00ad4c Merge branch 'master' into master 2114715 Merge pull request #93 from common-workflow-language/mypy-0.501 f1d824d Don't switch to mypy v0.501 yet 8ab39f9 Upgrade to mypy v0.501 --strict 4f4fe73 added instalation from source instructions d2d7b1e added a warning message when error is being ignored 1e05b44 Set publicID when passing plain data to RDF parser. (#91) 15c0ab8 Relax strict checking foreign properties (#90) 3a7cf29 Merge pull request #89 from tjelvar-olsson/master f16c1a8 Add more directory fallbacks Loader session 50d0361 Make ref_resolver.Loader more fault tolerant 7522a24 Make split_frag optional in converting fileuri. (#88) 878a74f Merge pull request #87 from common-workflow-language/release-update 6209496 newer pip dependency 0347b8e Merge pull request #85 from chapmanb/debug-fixes 5c4ef60 Merge branch 'master' into debug-fixes 286001f Merge pull request #86 from common-workflow-language/mypy-470 9676ea8 track mypy rename e47f4e6 Improve debugging: missing filename and bad lists bee8bf1 Merge pull request #84 from common-workflow-language/mypy0.470 822c8c1 upgrade to mypy 0.470 42f4bdd Fix performance regression in schema loading by stripping ruamel.yaml metadata, (#83) d21fa84 Expand uris in $schemas in resolve_all(). (#82) 55018dd Bump version to 2.2 (#81) b883526 Bugfix to uri_file_path. (#80) 0c7cc02 Remove pathlib2 because it is only used for one thing but creates dependency issues. (#79) f31369c Merge pull request #78 from common-workflow-language/explict-include f34d4ac explicitly include MD files & fix release test c1cf3e9 Merge pull request #77 from common-workflow-language/simplify-manifest a2e4236 simply the manifest 5d04ddb Merge pull request #76 from common-workflow-language/ensure-test-paths a37b623 ensure all test data is loaded flexibly 6a02920 Merge pull request #75 from common-workflow-language/adjust-manifest cefb22e Merge branch 'master' into adjust-manifest dd496a1 Include all relevant files so tests work post installation 8edfbc2 Bugfix resolving references to document fragments. (#74) a3ba891 Relax ruamel.yaml version pin. (#73) a5bbb36 Ensure keys are sorted when generated CommentedMap from a regular dict. (#72) ebdf27d Merge pull request #71 from common-workflow-language/pip-conflict-checker 2aecb05 Merge branch 'master' into pip-conflict-checker 999d689 Merge pull request #70 from psafont/win32 01af23d Fix typing 7a793ac drop html5lib pin, add pip-conflict-check 0e616d1 Fix types ad742f3 Manage pathlib and pathlib2 for all python versions cb754df Make url2pathname args compatible with python3 da751f7 Don't assume the type of ref f302f5f Fix a couple of test cases 0c0e4cc Added pathlib2 requirement 159541c Paths and URIs conversions are more generalized 9a9d639 Tweak cmap to better propagate filename down. (#69) 5a15c58 merge 1.21 changes into salad 2.0 (#68) a1db0ac Merge pull request #67 from common-workflow-language/mr-c-patch-1 56ac12b Link to v1.0 specs beed46d Merge pull request #65 from common-workflow-language/mobile-friendly 1e0f6a5 set the meta viewport tag f0b88d0 Bump major version number, because of backwards-incompatible changes. (#63) 8ac4540 Provide line numbers in error messages (#62) 8454546 Bugfix for validating "class" field. (#61) 01dd303 Optimize validate (#60) 423e48b Merge pull request #58 from common-workflow-language/quiet-external-schema f2d86c1 Merge branch 'master' into quiet-external-schema f6744ad Merge pull request #53 from common-workflow-language/check-file-support-http 1728fd4 fix types 823f378 Merge remote-tracking branch 'origin/master' into check-file-support-http 9b6a6be demote log about external schema b1d7b69 Support http/https existence check (using HTTP HEAD) in check_file. git-subtree-dir: v1.1.0-dev1/salad git-subtree-split: 27c20e89a8df4ff8cdb28fae59d2a5ea652c0afc

stain · 2018-04-09T15:48:00Z

See how @FarahZKhan and myself implemented this in common-workflow-language/cwltool#676 - described (and example research object) in https://doi.org/10.5281/zenodo.1208478

mr-c · 2019-06-29T14:37:00Z

Do we want to officially recommend or suggest use of CWLProv ?

mr-c mentioned this issue Oct 26, 2015

Store the execution metadata, including command line, when running a job common-workflow-language/cwltool#9

Closed

mr-c added this to the Draft 4 milestone Feb 16, 2016

mr-c changed the title ~~Store the execution metadata, including command line, when running a job~~ [Provenance] Store the execution metadata, including command line, when running a job Mar 11, 2016

mr-c modified the milestones: unscheduled, Draft 4 Jun 7, 2016

jmchilton mentioned this issue Jun 13, 2016

Support wfprov and Research Objects Export of Workflow Invocations galaxyproject/galaxy#2491

Open

tetron pushed a commit that referenced this issue Jul 20, 2017

Merge pull request #84 from common-workflow-language/mypy0.470

bee8bf1

upgrade to mypy 0.470

stain added a commit to common-workflow-language/cwlprov that referenced this issue Jul 5, 2018

NOTICE with copyright

cfedad7

First copyright 2015 from common-workflow-language/common-workflow-language#84

mr-c closed this as completed Aug 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Provenance] Store the execution metadata, including command line, when running a job #84

[Provenance] Store the execution metadata, including command line, when running a job #84

hmenager commented Jul 9, 2015

ghost commented Jul 14, 2015

mr-c commented Oct 26, 2015

hmenager commented Oct 26, 2015

mr-c commented Mar 11, 2016

ghost commented Mar 11, 2016

ghost commented Mar 12, 2016

stain commented Mar 15, 2016

stain commented Mar 15, 2016

stain commented Mar 15, 2016

stain commented Mar 16, 2016

stain commented Mar 16, 2016

stain commented Mar 16, 2016

ghost commented Mar 16, 2016

stain commented Mar 16, 2016

stain commented Apr 9, 2018

mr-c commented Jun 29, 2019

[Provenance] Store the execution metadata, including command line, when running a job #84

[Provenance] Store the execution metadata, including command line, when running a job #84

Comments

hmenager commented Jul 9, 2015

ghost commented Jul 14, 2015

mr-c commented Oct 26, 2015

hmenager commented Oct 26, 2015

mr-c commented Mar 11, 2016

ghost commented Mar 11, 2016

ghost commented Mar 12, 2016

stain commented Mar 15, 2016

stain commented Mar 15, 2016

stain commented Mar 15, 2016

stain commented Mar 16, 2016

stain commented Mar 16, 2016

stain commented Mar 16, 2016

ghost commented Mar 16, 2016

stain commented Mar 16, 2016

stain commented Apr 9, 2018

mr-c commented Jun 29, 2019