New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planning for CWLProv in toil-cwl-runner #2390

Open
mr-c opened this Issue Oct 2, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@mr-c
Copy link
Contributor

mr-c commented Oct 2, 2018

  • Refactor CWLJob.run() to return (outputs, metadata) instead of just outputs. metadata is a dictionary that will contain the information we need for generating CWLProv.

  • Propagate the metadata through the .run() calls to the root of the computation

  • Try to reuse Toil's Jobstore ID's (See #2449) for each CWLJob record this ID and the parent ID.

  • Fill metadata with a data structure containing runtime information about the tasks (tree or dict, with the keys being the jobstore IDs)

  • Generate a ProvenanceProfile per task and a ResearchObject when all the metadata has been gathered.

  • Refactor cwltool/provenance.py so that recorded time and time of recording are decoupled.

  • Refactor ProvenanceProfile:prospective_prov out of the class to be the function that creates all the ProvenanceProfiles and relates them in a tree-like structure.

  • Refactor cwltool/provenance.py so that we can defer file movements until the end of the run

  • Update Toil to use cwltool with the fixes (#2469)

Most of the progress is found on https://github.com/DataBiosphere/toil/tree/wip-prov

┆Issue is synchronized with this JIRA Story
┆Issue Number: TOIL-353

@mr-c

This comment has been minimized.

Copy link
Contributor

mr-c commented Nov 18, 2018

@psafont Can you update the 1st comment above with your status and any additional work you see that is needed?

@psafont

This comment has been minimized.

Copy link
Contributor

psafont commented Nov 19, 2018

There's quite a bit of friction in order to do the changes because CWLProv is part of the cwltool package. I don't know up to what point can it be beneficial to separate it into a different module.

There is not much separation of concerns in some functions: they use provenance.py's functions directly. I think this is linked with some of the tight coupling we've already solved. The question is how far do we want to go. (I've only spent about an hour going into @inutano's provenance work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment