Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prototype workflow RO-Crate from snakemake workflow #1

Open
douglowe opened this issue May 8, 2024 · 5 comments
Open

prototype workflow RO-Crate from snakemake workflow #1

douglowe opened this issue May 8, 2024 · 5 comments
Assignees

Comments

@douglowe
Copy link
Contributor

douglowe commented May 8, 2024

Work coming from the BGE hackathon in Leiden. Reporting of products made should go in the report here: https://docs.google.com/document/d/1if6ukMKN3xHQHAwGEQPhhgvp7iQcFnauj4W1ZtIs8wk/edit

Aim is to write a python tool which will create a workflow RO-Crate from the outputs and reports created from a snakemake workflow.

Snakemake workflow used: https://github.com/o-william-white/skim2mt.git

@tbrown91
Copy link
Collaborator

Hi @douglowe

I am able to give this more thought this week, so am wondering what the best next steps would be. At the moment all of the information I have pulled from the html are just sitting in variables. Do you think it will be easy to turn this into provenance ro-crate?

@douglowe
Copy link
Contributor Author

Hi @tbrown91 - I'm getting a bit of time to look at this too, and have conflicting ideas about how to go about this.

In the long-term I think we can add to the snakemake runner itself, creating an 'ro-crate' report option, as an alternative to the html report. See this issue I created in a local copy of the snakemake repo: eScienceLab/snakemake#1

This probably should start with creating some example RO-Crate files (first a workflow crate, then script the building of a provenance crate from that, using the metadata pulled from the html report), so that we can build a test to include in the snakemake testing suites. Let's have a go at creating that this week?

@tbrown91
Copy link
Collaborator

Baby steps befd0dd

There are many things I don't like about the snakemake report, but particularly that the input and output files are not really listed or names. There are a number of wildcards left in, but maybe this is not important for a workflow RO-crate. For the provenance RO-crate I think we will not be able to extract the information we are looking for

@fbartusch
Copy link
Collaborator

Hi, I was added by @douglowe as collaborator to this repository.

You're scraping the information directly from the html report, right? The html report itself is generated by a Snakemake report plugin and uses the data stored in .snakemake/metadata/ in the workflow's main directory.
Since Snakemake 8 there is a plugin system for some functionality, among other things the report function.

There's a poetry template provided by Snakemake project for new plugins.
I fiddled a bit around a few weeks ago with the template was able to some provenance information rather quickly.

I think that's a cleaner way to get the needed information. Also, this does not break if html report changes it's structure/layout/content. I can provide some code in the next days, when my work schedule allows it.

@fbartusch
Copy link
Collaborator

The skim2mt workflow ran through on our cluster. I added an usable Snakemake report plugin to the repo and documented how you can rebuild it in the README.
The plugin works on the skim2mt metadata on our cluster (e.g. it does not throw errors). Although the plugin does nothing useful in the moment this is a good sign :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants