Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Provenance] Store the execution metadata, including command line, when running a job #84

Closed
hmenager opened this issue Jul 9, 2015 · 16 comments
Milestone

Comments

@hmenager
Copy link
Member

hmenager commented Jul 9, 2015

Maybe also include the generated files.

@ghost
Copy link

ghost commented Jul 14, 2015

Is this for reference implementation, or do you think the spec should cover how to do this consistently?

If latter, perhaps we could just specify a few properties to extend wfprov. Some regular metadata is already handled by PROV-O, but might be missing stuff like command line and environment setup used. It might be worth adding these properties elsewhere in generic form (EDAM?).

@mr-c
Copy link
Member

mr-c commented Oct 26, 2015

+1 for the reference implementation to do so, moved to common-workflow-language/cwltool#9

For the spec, I agree it that it would be useful to expose the metadata to the tools and require a specific report available to the user.

@hmenager
Copy link
Member Author

+1 on moving this to ref implementation ;)

@mr-c mr-c added this to the Draft 4 milestone Feb 16, 2016
@mr-c mr-c changed the title Store the execution metadata, including command line, when running a job [Provenance] Store the execution metadata, including command line, when running a job Mar 11, 2016
@mr-c
Copy link
Member

mr-c commented Mar 11, 2016

From @stain in chat

I would like to work on how we can have a somewhat common view of provenance from a CWL worflow run - we can use what we've done in Taverna https://github.com/apache/incubator-taverna-engine/tree/master/taverna-prov#structure-of-exported-provenanceand https://w3id.org/ro/2016-01-28/wfprov as a starting point.
I think something like our RO Bundle is needed so you can gather all the files made and used, and include copies of the CWL workflow and tool descriptions at the time of running.
[...]
in tavernaprov we added some more technical things that might not be needed in CWL as they could be engine specific
like we included a way to show a snippet of the value within the graph
but obviously you don't want to do that with larger values

@ghost
Copy link

ghost commented Mar 11, 2016

Also from @stain:

perhaps come up with a decent JSON-LD profile for wfprov

+1.

@ghost
Copy link

ghost commented Mar 12, 2016

Playing around with structure of run snapshots. Starting with the simplest workflow:

# wf.yaml
class: Workflow
inputs:
  - id: i1
outputs:
  - id: o1
    source: s1.o1
steps:
  - id: s1
    run: tool.yaml
    inputs:
      - id: s1.i1
        source: i1

This one just has a single step - a tool that e.g. increments its input integer.
After running (i1=1), we can get a wfprov:WorkflowRun that might be represented in JSON-LD like:

class: WorkflowRun                          # type
id: "#run1"
process: wf.yaml                            # describedByProcess
enactedBy: http://example.org/engine
inputs:                                     # reverse of usedInput
  - {port: wf.yaml#i1, value: 1}            # port = describedByParameter
outputs:                                    # reverse of wasOutputFrom
  - {port: wf.yaml#o1, value: 2}
runs:                                       # reverse of wasPartOfWorkflowRun
  - class: ProcessRun
    id: "#run1.s1"
    process: wf.yaml#s1                     # Step, not tool.
    enactedBy: http://example.org/engine
    inputs:
      - {port: wf.yaml#s1.i1, value: 1}
    outputs:
      - {port: wf.yaml#s1.o1, value: 2}

Not quite sure how to represent wfprov:Artifacts that are just integers, so here they are objects with value property.
Also not sure if this is even a valid wfprov:WorkflowRun.

The trickiness comes when we nest this workflow in an outer workflow and scatter it:

# wf-outer.yaml
class: Workflow
outputs:
  - id: o1
    source: s1.o1
steps:
  - id: s1
    run: wf.yaml
    scatter: s1.i1
    inputs:
      - id: s1.i1
        source: i1

Suppose we run this with [1, 2] as input.
Should do a scattered increment and get [2, 3] as result.
Without any RDF restrictions, I would encode the runs/jobs like this:

- class: WorkflowJob
  id: run1
  app: wf-outer.yaml
  parent: null
  inputs: {i1: [1, 2]}
  outputs: {o1: [2, 3]}
- class: ScatterJob
  id: run1.s1
  app: null
  parent: run1
  inputs: {i1: [1, 2]}
  outputs: {o1: [2, 3]}
- class: WorkflowJob
  id: run1.s1.1
  app: wf.yaml
  parent: run1.s1
  inputs: {i1: 1}
  outputs: {o1: 2}
- class: WorkflowJob
  id: run1.s1.2
  app: wf.yaml
  parent: run1.s1
  inputs: {i1: 2}
  outputs: {o1: 3}
- class: CommandLineJob
  id: run1.s1.1.s1
  app: tool.yaml
  parent: run1.s1.1
  inputs: {i1: 1}
  outputs: {o1: 2}
- class: CommandLineJob
  id: run1.s1.2.s1
  app: tool.yaml
  parent: run1.s1.2
  inputs: {i1: 2}
  outputs: {o1: 3}

There's a root job (run1), a ScatterJob which is a container for iterations,
two nested WorkflowJobs, each with own CommandLineJob.

Job identifiers are built semantically: <parent_id>.<step_id> or <parent_id>.<iteration_index>.
This should work since CWL local identifiers should match the [A-Za-z][A-Za-z0-9]* pattern.
An improvement would be to have IRIs (e.g. http://example.org/jobs/run1.s1).
It's only a nicety - IDs can be anything (hopefully URLs though).
Likewise, app references should be URLs.

Putting this into wfprov is tricky. Best I could come up is:

class: WorkflowRun
id: "#run1"
process: wf-outer.yaml
inputs:
  - {port: wf-outer.yaml#i1, valueJson: "[1, 2]"}
outputs:
  - {port: wf-outer.yaml#o1, valueJson: "[2, 3]"}
runs:
  - class: IterationRun
    id: "#run1.s1"
    process: wf-outer.yaml#s1
    inputs:
      - {port: wf-outer.yaml#s1.i1, valueJson: "[1, 2]"}
    outputs:
      - {port: wf-outer.yaml#s1.o1, valueJson: "[2, 3]"}
    runs:
      - class: WorkflowRun
        id: "#run1.s1.1"
        process: wf-outer.yaml#s1
        inputs: [{port: wf.yaml#i1, value: 1}]
        outputs: [{port: wf.yaml#o1, value: 2}]
        runs:
          - class: ProcessRun
            id: "#run1.s1.1.s1"
            process: wf.yaml#s1
            inputs: [{port: wf.yaml#s1.i1, value: 1}]
            outputs: [{port: wf.yaml#s1.o1, value: 2}]
      - class: WorkflowRun
        id: "#run1.s1.2"
        process: wf-outer.yaml#s1
        inputs: [{port: wf.yaml#i1, value: 2}]
        outputs: [{port: wf.yaml#o1, value: 3}]
        runs:
          - class: ProcessRun
            id: "#run1.s1.2.s1"
            process: wf.yaml#s1
            inputs: [{port: wf.yaml#s1.i1, value: 2}]
            outputs: [{port: wf.yaml#s1.o1, value: 3}]

Few issues here:

  • Needed to encode the arrays in a valueJson property. This is a known issue with CWL though (encoding non-primitive values in RDF).
  • IterationRun (not part of wfprov) is a specialization of a ProcessRun, similar to the previously mentioned ScatterJob. Not sure how I would encode the iterations without it.
  • The wfprov:wasPartOfWorkflowRun property doesn't quite fit with IterationRuns.
  • For the nested WorkflowRuns, there's an inconsistency with port properties.
    If this was a non-scattered workflow component, those would point to inputs and outputs of the step in the outer workflow
    (in the previous example ProcessRuns had e.g. {port: wf.yaml#s1.i1, value: 1} rather than {port: tool.yaml#i1, value: 1}).

I'm still confused about wfdesc:Process and how it maps to CWL.
It seems to map to cwl:Step nicely, but it also maps to cwl:Workflow and cwl:CommandLineTool.
Perhaps if I found an example wfdesc:Workflow that has two instances of the same workflow as components the confusion would be cleared :)

@stain
Copy link
Member

stain commented Mar 15, 2016

Great draft, @ntijanic!

It's OK that numbers are also wfprov:Artifact - they don't have to be files. I guess just value (rdf:value?) would make sense. In Taverna's workflow traces there's also UUID-like identifiers on the outputs so they can be cited elsewhere - but I don't think we need to require this. (This would make it easier to see that a workflow output value was the same ITEM as the one at a process run output, not just accidentally the same bytes).

We have to be careful about how we reference the workflow steps so we get the identifiers straight, so how you do process: wf.yaml#s1 should be correct - each ProcessRun is an enactment of the step defined in the workflow (wfdesc:Process).

I guess some confusion comes with the term.. in early wfdesc drafts we used to call all the wfdesc terms *Description to say that they are the plans for execution, so a wfdesc:Workflow is a workflow plan/description, and equivalent for wfdesc:Process. (It was dropped as the namespace already says desc :)

I'll go through the scatter considerations in a separate comment.

@stain
Copy link
Member

stain commented Mar 15, 2016

First of all, perhaps it's useful to look at this recent release candidate for updating the wfdesc and wfprov ontologies. They should be made live some time later this month when we have finished reviewing them - feel free to help!)

cwl:CommandLineTool should probably not be a wfdesc:Process as it can be used in several steps, but a wfdesc:ProcessImplementation, referred to with wfdesc:hasImplementation. (this is equivalent to Taverna's Activity concept)

For a wfdesc:Workflow sometimes the equivalent wfdesc:WorkflowDefinition is the same (e.g. the top level Workflow in your example) - but in theory (and in practice for Taverna) the Workflow as a structure can have a common (UUID) identifier no matter where it is saved or included - hence the possibility to keep those distinct. I'm OK with CWL using the same identifier here as it's not a file-format where it's necessarily easy to remember to update such a UUID (as we do in Taverna's GUI).

So the special case is when a workflow is used as a nested workflow twice - for CWL I think the easiest is to make the identifier for each the inner Workflow instances to match the outer cwl:Step - and then both can have wfdesc:hasWorkflowDefinition to the same file. That would mean a bit of trouble in the provenance of the inner workflow runs as there would be two runs with the same parameter names appearing.

If a cwl:Step is a nested workflow, then the equivalent wfdesc:ProcessRuns gets 'upgraded' to wfdesc:WorkflowRuns which would have inner ProcessRuns linked to it with wfprov:wasPartOfWorkflowRun. So you can climb back out from that to find which process run (and in your model then eventually also which IterationRun) that particular workflow run was part of.

But on the other hand, if you wanted to know "all values seen at a given port" across all scatter runs, this would still be easy as they would all be having the same describedByParameter link.

@stain
Copy link
Member

stain commented Mar 15, 2016

I quite like the IterationRun idea! I see how it messes up the ports as it doesn't have its own ports (actually internally in Taverna's engine the iteration strategy DO have its own ports, but we don't assign those identifiers in the provenance) - but I don't see a big problem with saying this is an intermediate step as long as you have distinct identifiers for each inner run that it wraps.

So I would just use the same port parameters on both - it's not very different from the issue with nested workflow ports - where a workflow input port is both an outer receiver port and inner sender port.

@stain
Copy link
Member

stain commented Mar 16, 2016

Here's a bit of motivation for wfprov based on https://gist.github.com/stain/fa63a3527bd09be2a42d

Here a particular run 6a6ffc3d of processor join_cd used two values at ports first and second:

<run/process/6a6ffc3d-47cd-48c4-9a6e-8451481f15e3/>     a prov:Activity;
     prov:qualifiedAssociation  [ 
           prov:agent <workflowrun.prov.ttl#taverna-engine>; 
          # which step was run
           prov:hadPlan <wf/workflow/Hello_abcd/processor/join_cd/> ];

     prov:qualifiedUsage  
          [  # data 293661a3 used as input 'first'
            prov:entity <data/ref/293661a3-3e4e-4ff2-aff8-f8513a19a04a>;
            prov:hadRole <wf/workflow/Hello_abcd/processor/join_cd/in/first> ],
          [ # data 50d03488 used as input 'second'
            prov:entity <data/ref/50d03488-50e2-4a5b-9dc6-28579b24fcef>;
            prov:hadRole <wf/workflow/Hello_abcd/processor/join_cd/in/second> ];

So yes, it's clear here which value was used in which port, but it's a bit elaborate as we have to invoke prov:hadRole to qualify the prov:used relation.

wfprov is more descriptive with shortcuts like describedByProcess and usedInput:

<run/process/6a6ffc3d-47cd-48c4-9a6e-8451481f15e3/>     a wfprov:WorkflowRun;
    wfprov:describedByProcess <wf/workflow/Hello_abcd/processor/join_cd/>;
    wfprov:usedInput <data/ref/293661a3-3e4e-4ff2-aff8-f8513a19a04a>,
            <data/ref/50d03488-50e2-4a5b-9dc6-28579b24fcef>;

However if you care about at which input port, you would have to combine the wfdesc Workflow listing of its hasInput:

<wf/workflow/Hello_abcd/processor/join_cd/> a wfdesc:Workflow ;
      rdfs:label "join_cd" ;
    wfdesc:hasInput <processor/join_cd/in/first> , <processor/join_cd/in/second> ;
    wfdesc:hasOutput <processor/join_cd/out/joint>  .

and the ports the artifact appeared at:

<data/ref/50d03488-50e2-4a5b-9dc6-28579b24fcef>     a wfprov:Artifact ;
  wfprov:describedByParameter <wf/workflow/Hello_abcd/processor/join_cd/in/second>,
            <wf/workflow/join_ab/processor/concatenate/in/string2> .

@stain
Copy link
Member

stain commented Mar 16, 2016

See https://gist.github.com/stain/fa63a3527bd09be2a42d#file-helloabcd-wfprov-ttl for an abbreviated example of wfprov from Taverna of a workflow that runs the same inserted nested workflow twice, iterating over it twice, which should mean 4 distinct calls to the inner "concatenate" process.

I've shortened the URLs and removed the "duplicate" statements in pure PROV - see the rest of the gist for the gory details - or download the run bundle ZIP for the actual provenance output.

I don't think the CWL output need to enforce a distinction between the data 'item' and its file with tavernaprov:content as we did in Taverna - but one reason we did it is that it would be wrong to claim that the serialized file was passed along in the workflow when the value only existed in memory as a data structure. (In Taverna provenance, those intermediate value files are saved out afterwards).

@stain
Copy link
Member

stain commented Mar 16, 2016

I did a quick translate of the helloabcd wfprov as JSON-LD.

Can be shortened further!

@ghost
Copy link

ghost commented Mar 16, 2016

Wow. Thanks @stain, this clarifies a lot for me.

I'll need time to go over it in detail; some questions/comments meanwhile:

But on the other hand, if you wanted to know "all values seen at a given port" across all scatter runs, this would still be easy as they would all be having the same describedByParameter link.

Ah, so describedByParameter, usedInput and wasOutputFrom don't necessarily mean that it was the exact value on that port - it could mean they were part of value on that port? Similarly, data/list/Hello-ab is a list and references its members through hadMember. But where do you encode the actual list value (preserving the ordering of list items)?

I don't think the CWL output need to enforce a distinction between the data 'item' and its file

Similar issue to above, since CWL's artifacts can be arbitrary dicts-and-lists trees which can contain files as leaves. Perhaps we can somehow store the tree itself, plus a flat set of all contained files/artifacts through hadMember.

I really like the idea of wrapping everything in an Artifact to signify that it's not just any 42, it's the same 42 that was the output from previous step.

@stain
Copy link
Member

stain commented Mar 16, 2016

No, we used describedByParameter to say it was the actual final value at the port - in earlier approaches we had "seen at port (possibly part of a list)" - but this made it very difficult to untangle the list and its members when trying to show the data afterwards as you would have to query for the "highest level list that doesn't directly or indirectly contain the others".

(or as happened at one point - we didn't track the lists at all, which meant downstream consumers of the list seem to get the list out of thin air)

But if you have two different parameters, then it would not longer be tangled - so you would then come back to the issue of minting separate identifiers for the inner and outer ports of the IterationRun.

So instead we ended up declaring the list members using the cumbersome prov:Dictionary approach:

<http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/list/71f5d67b-195d-49bb-94c0-1b800d18f72a/false/1>     
    a wfprov:Artifact, prov:Collection, prov:Dictionary,prov:Entity;
    # The individual values are **not** described by this parameter
     wfprov:describedByParameter <http://ns.taverna.org.uk/2010/workflowBundle/f5fddb99-9677-43cb-aac5-4d14e1ad1b46/workflow/Hello_abcd/processor/a_b/out/list>;
     wfprov:wasOutputFrom <http://ns.taverna.org.uk/2011/run/60bce245-9f15-4391-902f-134bc699c583/process/6680bce1-8b12-411f-841c-84df7b25cefb/>;
# Short form - no order:
     prov:hadMember <http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/ref/0f086ff4-c030-42c3-ab05-ca5fca72e432>,
            <http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/ref/32eba706-636a-47c3-9607-a7712d7a0a49>;
# Longer form - index as an `xsd:long`
     prov:hadDictionaryMember
        [ a prov:KeyEntityPair;
           prov:pairEntity <http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/ref/0f086ff4-c030-42c3-ab05-ca5fca72e432>;
           prov:pairKey "1"^^xsd:long ],
        [ a prov:KeyEntityPair;
           prov:pairEntity <http://ns.taverna.org.uk/2011/data/60bce245-9f15-4391-902f-134bc699c583/ref/32eba706-636a-47c3-9607-a7712d7a0a49>;
           prov:pairKey "0"^^xsd:long ];

A list of lists (depth 2) would be done by another such prov:Dictionary listed as the prov:pairEntity - which become quite deep.

With JSON-LD you can use native JSON lists and make them into a proper RDF List - however in PROV-O there was resistance against RDF lists as they could be tricky to support in the other PROV serializations. So to do that you would need a new property - perhaps it would be appropriate to include in wfdesc as I don't like the above.

But there could be an argument for allowing describedByParameter also on the list items, such as when there is pipelining/item streaming and values are delivered before the final list - then the provenance could look inconsistent with a list being generated after its values have been consumed downstream.

@mr-c mr-c modified the milestones: unscheduled, Draft 4 Jun 7, 2016
tetron pushed a commit that referenced this issue Jul 20, 2017
tetron pushed a commit that referenced this issue Mar 21, 2018
27c20e8 Corrects initializer to FileCache when no TMP or HOME (#159)
66457f2 unlike cwltest, only one wheel here (#162)
a343bc8 main: print string instead of bytes (#146)
d42ac08 Travis fix for forks (#161)
f74a314 Fix date --utc.  Update version for next release.  Update classifiers. (#160)
6ce2b9a  Default field (#150)
8e520ec correct path due to schema_salad vs. schema-salad (#154)
c359d07 Upgrade release process (#153)
5d75f90 Validation shouldn't warn about foreign properties if skip_schemas is True. (#152)
9b234bb Make link checking $schemas section and extension fields non-fatal. (#145)
8752f87 Make exceptions loading extension schemas non-fatal. (#144)
8ae7664 Unify the error message format for invalid YAML (#143)
fc00a2c Add space between filename and message for `--print-oneline` (#142)
91e1b41 Fix error message for resolve_ref with --print-oneline (#141)
3f829b1 Better itemize handling for --print-oneline (#136)
7b1d94f include schema_salad/tests/docimp in the dist (#140)
2d72415 Fix for pylint (#139)
6e9abce Fix for invalid YAML with `--print-oneline` (#137)
a9db248 Merge pull request #138 from tom-tan/add-test-for-print-oneline
25fd531 Add test for Issue #135
c29eac3 Python schema loader codegen (WIP) (#134)
d654f83 Merge pull request #135 from tom-tan/print-oneline
52fd416 Fix for pep8
d6e7298 Fix for pylint
cded5f4 Add --print-oneline option
c7f3140 Refactor Windows relative path support (#131)
2d94392 Add include_traceback option to SourceLine. (#133)
70519d9 Merge pull request #128 from common-workflow-language/bump_avro
2543f2c Bump version of `avro` for Python 3
9554e5f Merge pull request #127 from common-workflow-language/py3-win-workaround
f637d60 add note about workaround
eac115d Merge pull request #125 from common-workflow-language/tmp_fix_windows
25990c5 __init__.py: Use logger to raise error on Windows rather than print
36c04d1 utils.py: Fix missing import
ce49d5d add error message when not able to create \tmp folder in Windows
a81a35f schema_salad/__init__.py: Create '\tmp' folder on windows if not present
f61676e utils.py: add onWindows() function
5e49770 Merge pull request #123 from common-workflow-language/modernize
1df0a01 Merge branch 'master' into modernize
f1aea74 requirements.txt: add future module dep
5cd2f36 ignore type checking past module
3ad9a21 remove --warn-unused-ignores flag since past is giving error in py3 but not py2
a6cedd3 move avro typshed to 2and3 and use Text instead of unicode
6450eee delete typesheds/3/avro-python3
193ad1e fix avro requirements related issue
627f1d4 fix linting errors
7bf4b32 setup.py: for python3 use forked avro pip package
b5b9595 setup.py: dependency link hack, but need to pass argv to pip explicitly
043bb3a schema.py: remove redundant lines while making avro lib import
626208d __init__.py: call autotranslate on avro py2 if running in python3
f619227 Merge pull request #122 from common-workflow-language/modernize
de7719e use io.open instead of builtin open Leads to more consistant behaviour across py2 and py3 also, specify encoding in the open command rather than reading raw bytes and manually decoding
58f1daa Merge pull request #121 from common-workflow-language/modernize
540f0fd fix: use 'rb' instead of 'wb' since we're reading the file
a46401a Merge remote-tracking branch 'origin/master' into modernize
ecf4c3f ref_resolver, makedoc: explicitly open files in binary mode
d47200e mypy_requirements: bump mypy version to 0.520
eef27fa Merge pull request #120 from kapilkd13/windows-fix
eb3f5e2 Merge branch 'master' into windows-fix
4b4c4d6 modified urljoin: on windows it was ignoring Drive names when constructing final url
3cba9dd Merge pull request #110 from kapilkd13/windows
c51b0e1 re-add six import
f91d588 Merge branch 'master' into windows
517617c Merge pull request #119 from common-workflow-language/tweak-build
dbe4c81 tweak the build
7ea7dcf Merge pull request #118 from common-workflow-language/manu-chroma-patch-1
6c19085 py3 compatibility: modified url2pathname acoordingly
b54bd29 py3 compatibility: modified urlparse acoordingly
bd75357 modifying urljoin for same url and baseurl, moving windows path conversion to after all resolution point
7b614ea modified test mixin for correct base path
5188caa added windows build badge
1be2b8f added unix build badge
45b1372 windows compatibility issue: windows file paths were not fetched
b54ceea windows compatibility issue: windows Drive names are considered as scheme
3834118 windows compatibility issue: after path split windows path contain / before drive name
89c48b2 test_examples: normalizing path and correcting file scheme for windows
46afdf7 normalizing path depending on OS
f047c04 test_fetch: correcting file scheme for windows
837e3e1  added appveyor.yml for windows testing and Integration
bde8dfb pin upper version of ruamel.yaml package
4d1c3ff Merge pull request #117 from common-workflow-language/better-bump
a8c3fde oops
b3c579f a better bump of the version (mea culpa)
fa93acc Merge pull request #109 from manu-chroma/modernize
52c054e 3.6 too!
a8903de setup.py: change version, uncomment classifiers for py3
d9bdc8b setup.py: cleanup
7748526 schema.py: fix some types, remove redundant str
b2a5a26 bump ruamel version, put upper bound on ruamel in setup.py
3d60033 mypy_requirements.txt: bump mypy version
a1f697b ref_resolver.py: mypy: use Text instead of unicode
669ebfc remove redundant condition with typing package
6b3d068 typeshed/3: add stubs(autogen) for avro-python3
8672a5c sourceline.py: replace AnyStr with Text type
740e235 mypy: turn off warn for unused ignore in --py3
4f0ad4f tox: separate out mypy targets for py2 and py3
d16c0e5 move cachecontrol stub files to typshed/2and3/
ad05e61 minor changes in rdflib stub files to make them compatible with mypy in py3 mode and move to typshed/2and3/ directory
65e583c mypy: some improvements in type annotations
f051f49 typesheds/2.7: remove stub files which already come with mypy==0.511
1d11190 Makefile: add target for mypy without --py2 flag
c6c88f1 add mypy.ini and ignore all ruamel.yaml package errors
2850043 makedoc.py: pass unicode string mintune markdown function
963ea3f mypy: add typeshed/2and3 in mypypath
62cbece add stub file for mistune package author = "Aleksandr Slepchenkov" email = "Sl.aleksandr28@gmail.com"
4958ebf use Text type where ever Union[str, unicode] is expected
d16a308 mypy: remove redundant ignore type annotations
090c1ba mypy: use latest stable version of mypy - use update ruamel.yaml package which includes more mypy annotations
67d7481 use same import pattern for six.urllib accross the codebase - also, re-ogranise six lib imports
150730c py3 fix: use six StringIO import
0773e44 fix urlparse using six in tests and schema_salad core; no automated fixers
6a11b16 mypy type annotations fix: replace unicode -> Text
8b6f9b3 Apply Python3 modernize transforms
1eb19b6 use typing>=3.5.3
b0b1f14 disable git version tagging to avoid testing conflicts with cwltool
ff000db setup.py: add six as explicit dependency
c66a3f9 tests: remove redundant parentheses in print statements
b1605e3 fix: use six integer_types and string_types in place of int and basestring
29dcb7f tox.ini: fix bugs, enable flake8 linting for all python versions
5efed90 tox.ini: re-write file, include unit tests for all supported versions of python - enable more python3 versions on travis for unit testing
508c8d3 setup.py: bump version to '3.0'
9874d11 use same import pattern for six.urllib accross the codebase - also, re-ogranise six lib imports
d23c804 tests: use assertEqual() inplace assertEquals()
3095597 fix: use six.integer_types to handle int, long incompatibles
965378f .tox.ini: turn on python3 unit testing
7ee4848 .gitignore: add more local folders
4d3c6de fix: use list of dict keys to iterate, fix flake8 warnings - py3 compatibility
916aaa1 py3: use six.moves.urllib instead of urllib
9108c7b fix: minor: py3 compatibility wrt avro
cec0213 setup.py: install different version of avro in case of py3 runtime
3704c98 py3 fix: use six StringIO import
e795ac9 fix: make regex command python3 compatible
b4273f2 fix urlparse using six in tests and schema_salad core; no automated fixers
df685ea modernize tests
e9bb3ec mypy type annotations fix: replace unicode -> Text
3080c69 Apply Python3 modernize transforms
caec629 Merge pull request #116 from manu-chroma/master
5a76bb6 create mypy_requirements.txt for mypy related deps
ba88f5e add mypy, typed-ast dependency in requirements.txt
b61311c Makefile: pin mypy, typed-ast version for jenkins make target
9d9317e Merge pull request #114 from manu-chroma/master
8372fe2 Merge branch 'master' into master
d370602 Merge pull request #115 from common-workflow-language/skip_schemas
e39018a Optional skip_schemas
6a582c3 refactor flatten function. move to utils.py
3bba34e minor refactor: create utils.py
5b3863a Merge pull request #112 from common-workflow-language/tweak-cov
ffc3367 exclude tests from coverage report
5b0b019 Validation fix.  Must raise exception when raise_ex is true and record class (#107)
424ad78 Fix for self-colliding ids on $import (#102)
8a6eaff Validating Ids for duplicate Issue#56 (#98)
a560ef3 Pass through logger to capture warnings. (#101)
ca9218e Merge pull request #99 from manu-chroma/master
a63f75f schema_salad/tests: add tests for cli args testing: --version, empty args - fix return condition in main()
04c6784 schema_salad/main.py: --version now correctly prints version
61c2203 Merge pull request #100 from manu-chroma/test_patch
a082ab5 schema_salad/tests: use print() in tests
0114153 In vocab flag (#97)
39516e5 Add map and typedsl resolution documentation. (#96)
0b76655 Merge pull request #92 from kapilkd13/master
c00ad4c Merge branch 'master' into master
2114715 Merge pull request #93 from common-workflow-language/mypy-0.501
f1d824d Don't switch to mypy v0.501 yet
8ab39f9 Upgrade to mypy v0.501 --strict
4f4fe73 added instalation from source instructions
d2d7b1e added a warning message when error is being ignored
1e05b44 Set publicID when passing plain data to RDF parser. (#91)
15c0ab8 Relax strict checking foreign properties (#90)
3a7cf29 Merge pull request #89 from tjelvar-olsson/master
f16c1a8 Add more directory fallbacks Loader session
50d0361 Make ref_resolver.Loader more fault tolerant
7522a24 Make split_frag optional in converting fileuri. (#88)
878a74f Merge pull request #87 from common-workflow-language/release-update
6209496 newer pip dependency
0347b8e Merge pull request #85 from chapmanb/debug-fixes
5c4ef60 Merge branch 'master' into debug-fixes
286001f Merge pull request #86 from common-workflow-language/mypy-470
9676ea8 track mypy rename
e47f4e6 Improve debugging: missing filename and bad lists
bee8bf1 Merge pull request #84 from common-workflow-language/mypy0.470
822c8c1 upgrade to mypy 0.470
42f4bdd Fix performance regression in schema loading by stripping ruamel.yaml metadata, (#83)
d21fa84 Expand uris in $schemas in resolve_all(). (#82)
55018dd Bump version to 2.2 (#81)
b883526 Bugfix to uri_file_path. (#80)
0c7cc02 Remove pathlib2 because it is only used for one thing but creates dependency issues. (#79)
f31369c Merge pull request #78 from common-workflow-language/explict-include
f34d4ac explicitly include MD files & fix release test
c1cf3e9 Merge pull request #77 from common-workflow-language/simplify-manifest
a2e4236 simply the manifest
5d04ddb Merge pull request #76 from common-workflow-language/ensure-test-paths
a37b623 ensure all test data is loaded flexibly
6a02920 Merge pull request #75 from common-workflow-language/adjust-manifest
cefb22e Merge branch 'master' into adjust-manifest
dd496a1 Include all relevant files so tests work post installation
8edfbc2 Bugfix resolving references to document fragments. (#74)
a3ba891 Relax ruamel.yaml version pin. (#73)
a5bbb36 Ensure keys are sorted when generated CommentedMap from a regular dict. (#72)
ebdf27d Merge pull request #71 from common-workflow-language/pip-conflict-checker
2aecb05 Merge branch 'master' into pip-conflict-checker
999d689 Merge pull request #70 from psafont/win32
01af23d Fix typing
7a793ac drop html5lib pin, add pip-conflict-check
0e616d1 Fix types
ad742f3 Manage pathlib and pathlib2 for all python versions
cb754df Make url2pathname args compatible with python3
da751f7 Don't assume the type of ref
f302f5f Fix a couple of test cases
0c0e4cc Added pathlib2 requirement
159541c Paths and URIs conversions are more generalized
9a9d639 Tweak cmap to better propagate filename down. (#69)
5a15c58 merge 1.21 changes into salad 2.0 (#68)
a1db0ac Merge pull request #67 from common-workflow-language/mr-c-patch-1
56ac12b Link to v1.0 specs
beed46d Merge pull request #65 from common-workflow-language/mobile-friendly
1e0f6a5 set the meta viewport tag
f0b88d0 Bump major version number, because of backwards-incompatible changes. (#63)
8ac4540 Provide line numbers in error messages (#62)
8454546 Bugfix for validating "class" field. (#61)
01dd303 Optimize validate (#60)
423e48b Merge pull request #58 from common-workflow-language/quiet-external-schema
f2d86c1 Merge branch 'master' into quiet-external-schema
f6744ad Merge pull request #53 from common-workflow-language/check-file-support-http
1728fd4 fix types
823f378 Merge remote-tracking branch 'origin/master' into check-file-support-http
9b6a6be demote log about external schema
b1d7b69 Support http/https existence check (using HTTP HEAD) in check_file.

git-subtree-dir: v1.1.0-dev1/salad
git-subtree-split: 27c20e89a8df4ff8cdb28fae59d2a5ea652c0afc
@stain
Copy link
Member

stain commented Apr 9, 2018

See how @FarahZKhan and myself implemented this in common-workflow-language/cwltool#676 - described (and example research object) in https://doi.org/10.5281/zenodo.1208478

stain added a commit to common-workflow-language/cwlprov that referenced this issue Jul 5, 2018
@mr-c
Copy link
Member

mr-c commented Jun 29, 2019

Do we want to officially recommend or suggest use of CWLProv ?

@mr-c mr-c closed this as completed Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants