-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable batch execution of workflows #1929
Comments
I'd like for the solution to this to use pluggy (even if our implementations are just in Nextflow could also be a contended, it ticks many of the same boxes as argo and snakemake. Nextflow along with toil both seem popular in the bioinformatics space (maybe ask @ksanao ). I'm wondering if update&rerun are the right semantics for this. They definitely support the use-case, but I don't know if it's exactly what a user wants. I'm thinking of a user workflow where you have a small dummy dataset for developing and a big real dataset for running in the cluster. I don't think you develop on the dummy dataset, then once everything works, you run it on the big dataset and then are happy with the results. Rather, you'd probably go back and forth between the small and big dataset over time, adding things as needed, extending the analysis etc. |
Thanks @Panaetius - you're absolutely right, I forgot to add nextflow to the list - will do so now. Regarding the command semantics - yes you're right, I was being a bit too myopic. We definitely need to support a different kind of command here -
Here we could allow for seamlessly using workflow templates from other projects (or even other instances), e.g.
Running it without a parameter list would prompt you for whatever inputs need to be specified. This starts to bleed a bit into SwissDataScienceCenter/renku-python#1553 and probably other open issues. re: async or sync: Ideally it would be possible to do this async with the special case where you want to wait for completion. Since it's to be used from the UI async needs to be supported but maybe starting with sync mode would be sufficient for the PoC. |
Thanks @rokroskar and @Panaetius! This looks pretty good! |
RenkuLab use case for workflow execution on HPC: iSEE Dashboard Data Context
Each workflow consists of a single task executing Rscript with Rmd file input (folder Problem Desired solution
|
Check proposals in SwissDataScienceCenter/renku-python#2213 and SwissDataScienceCenter/renku-project-template#118 |
One of the goals of renku workflows is to allow a user to develop a robust, working pipeline by iterating quickly on a dataset locally (laptop, interactive session) and then send that workflow to a more capable resource to run on a bigger dataset or parameters requiring extra compute power.
A significant goal of the workflow KG representation was to allow for serialization of workflow information into other formats. At the moment only the common workflow language (CWL) is supported, but the same methods used to create CWL files can be extended to other workflow languages. One limitation of CWL is that there doesn't seem to be good support for running these workflows either on kubernetes or on HPC systems.
The goal of this epic is to serve as a roadmap for the implementation of a) the supporting devops infrastructure and b) the required code changes for a simple PoC of batch/remote workflow execution.
General use-case
A simple use-case might look something like this:
The last two steps are identical to what the user can do now, except that they would run in the kubernetes cluster. The steps should be sent to the workflow engine as a DAG expressed in whatever workflow language is needed by the backend. Some steps might run in parallel. Once all the steps have completed, they should push the changes back to the repo, just like a user would do if running those commands locally.
An analogous flow might be envisioned from the web UI, where the page showing the overview of the project's assets might inform the user that some workflow outputs are out of date and give the option to update them automatically.
Issues to consider
There are several issues to consider (in no particular order of importance):
Building a PoC
The result of this epic should be a general architecture for running remote workflows and a PoC that implements the architecture for some subset of the above functionality using a specific workflow backend. One obvious choice for kubernetes is Argo workflows. Other potential options:
The text was updated successfully, but these errors were encountered: