Existing Workflow systems

Michael R. Crusoe edited this page Jul 12, 2018 · 189 revisions

Computational Data Analysis Workflow Systems

Permalink: https://s.apache.org/existing-workflow-systems

An incomplete list

Please add new entries at the bottom.

See also: https://github.com/pditommaso/awesome-pipeline

  1. Arvados http://arvados.org
  2. Taverna http://www.taverna.org.uk/
  3. Galaxy http://galaxyproject.org/
  4. SHIWA https://www.shiwa-workflow.eu/
  5. Oozie https://oozie.apache.org/
  6. DNANexus https://wiki.dnanexus.com/API-Specification-v1.0.0/IO-and-Run-Specifications# https://wiki.dnanexus.com/API-Specification-v1.0.0/Workflows-and-Analyses#
  7. BioDT http://www.biodatomics.com/
  8. Agave http://agaveapi.co/live-docs/
  9. DiscoveryEnvironment http://www.iplantcollaborative.org/ci/discovery-environment
  10. Wings http://www.wings-workflows.org/
  11. Knime https://www.knime.org/
  12. make, rake, drake, ant, scons & many others. Software development relies heavily on tools to manage workflows related to compiling and packaging applications. For the most part these are file based and usually run on a single node, usually supporting parallel steps (make -j) and in some cases able to dispatch build steps to other machines (https://code.google.com/p/distcc/) https://github.com/Factual/drake
  13. Snakemake https://bitbucket.org/snakemake/snakemake
  14. BPipe http://bpipe.org
  15. Ruffus https://code.google.com/p/ruffus/
  16. NextFlow http://nextflow.io
  17. Luigi http://github.com/spotify/luigi
  18. SciLuigi. Helper library built on top of Luigi to ease development of Scientific workflows in Luigi: http://github.com/pharmbio/sciluigi
  19. Luigi Analysis Workflow (LAW) https://github.com/riga/law
  20. GATK Queue https://www.broadinstitute.org/gatk/guide/topic?name=queue
  21. Yabi https://ccg.murdoch.edu.au/yabi
  22. seqware Workflows are written in Java and executed using the Oozie Workflow Engine on Hadoop or SGE clusters. Uses Zip64 files to group the workflow definition file, workflow itself, sample settings, and data dependencies in a single file that can be exchanged between SeqWare users or archived. https://seqware.github.io/ https://seqware.github.io/docs/6-pipeline/
  23. Ketrew https://github.com/hammerlab/ketrew
  24. Pegasus http://pegasus.isi.edu/
  25. Airflow https://github.com/airbnb/airflow
  26. Cosmos https://cosmos.hms.harvard.edu/documentation/index.html http://bioinformatics.oxfordjournals.org/content/early/2014/07/24/bioinformatics.btu385.full [paper] Cosmos2: https://github.com/LPM-HMS/COSMOS2 http://cosmos.hms.harvard.edu/COSMOS2/
  27. Pinball https://github.com/pinterest/pinball
  28. bcbio https://bcbio-nextgen.readthedocs.org/en/latest/
  29. Chronos https://github.com/mesos/chronos
  30. Azkaban https://azkaban.github.io/
  31. Apache NiFi https://nifi.apache.org/docs/nifi-docs/html/overview.html
  32. flowr (R-based) http://docs.flowr.space/ https://github.com/sahilseth/flowr
  33. Mistral https://github.com/arteria-project https://wiki.openstack.org/wiki/Mistral#What_is_Mistral.3F https://docs.openstack.org/mistral/latest/user/wf_lang_v2.html
  34. nipype http://nipy.org/nipype/
  35. End of Day https://github.com/joestubbs/endofday
  36. BioDSL https://github.com/maasha/BioDSL
  37. BigDataScript http://pcingola.github.io/BigDataScript/
  38. Omics Pipe: uses Ruffus http://sulab.scripps.edu/omicspipe/
  39. Ensembl Hive https://github.com/Ensembl/ensembl-hive
  40. QuickNGS http://bifacility.uni-koeln.de/quickngs/web
  41. GenePattern http://www.broadinstitute.org/cancer/software/genepattern/
  42. Chipster http://chipster.csc.fi/
  43. The Genome Modeling System https://github.com/genome/gms
  44. Cuneiform, A Functional Workflow Language https://github.com/joergen7/cuneiform http://www.cuneiform-lang.org/
  45. Anvaya http://www.ncbi.nlm.nih.gov/pubmed/22809419 http://webapp.cabgrid.res.in/biocomp/Anvaya/ANVAYA_Main.html#HOWTO_INSTALL_ANVAYA
  46. Makeflow http://ccl.cse.nd.edu/software/makeflow/
  47. Airavata http://airavata.apache.org/
  48. Pyflow https://github.com/Illumina/pyflow
  49. Cluster Flow http://clusterflow.io
  50. Unipro UGENE http://ugene.net/ https://dx.doi.org/10.7717/peerj.644
  51. CloudSlang http://www.cloudslang.io/
  52. Stacks http://catchenlab.life.illinois.edu/stacks/
  53. Leaf http://www.francesconapolitano.it/leaf/index.html
  54. omictools http://omictools.com/
  55. Job Description Language. The Job Description Language, JDL, is a high-level, user-oriented language based on Condor classified advertisements for describing jobs and aggregates of jobs such as Direct Acyclic Graphs and Collections. https://edms.cern.ch/ui/file/590869/1/WMS-JDL.pdf
  56. YAWL yet another workflow language http://dx.doi.org/10.1016/j.is.2004.02.002 http://www.yawlfoundation.org/
  57. Triquetrum https://projects.eclipse.org/projects/technology.triquetrum https://github.com/eclipse/triquetrum/
  58. Kronos https://github.com/jtaghiyar/kronos
  59. qsubsec http://doi.org/10.1093/bioinformatics/btv698 https://github.com/alastair-droop/qsubsec
  60. YesWorkflow http://yesworkflow.org
  61. gwf - Grid WorkFlow https://github.com/gwforg/gwf http://gwf.readthedocs.io/
  62. Fireworks. https://pythonhosted.org/FireWorks/
  63. NGLess: NGS with less work http://ngless.rtfd.io
  64. pypipegraph https://github.com/TyberiusPrime/pypipegraph
  65. Cromwell https://github.com/broadinstitute/cromwell
  66. Dagobah - Simple DAG-based job scheduler in Python. https://github.com/thieman/dagobah
  67. sushi https://github.com/uzh/sushi
  68. Clinical Trial Processor - A program for processing clinical trials data. http://mircwiki.rsna.org/index.php?title=MIRC_CTP
  69. Noodles http://nlesc.github.io/noodles/
  70. Swift http://swift-lang.org/main/
  71. Consonance (runs SeqWare & CWL) https://github.com/Consonance/consonance/wiki
  72. Dog https://github.com/dogtools/dog
  73. Produce https://github.com/texttheater/produce
  74. LONI Pipeline http://pipeline.loni.usc.edu/
  75. Cpipe https://github.com/MelbourneGenomics/cpipe
  76. AWE https://github.com/MG-RAST/AWE
  77. (Py)COMPSs https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/
  78. KLIKO https://github.com/gijzelaerr/kliko
  79. Script of Scripts https://github.com/vatlab/SoS https://vatlab.github.io/sos-docs/ https://doi.org/10.1093/bioinformatics/bty405
  80. XNAT Pipeline Engine https://wiki.xnat.org/display/XNAT/Pipeline+Engine https://wiki.xnat.org/display/XNAT/XNAT+Pipeline+Development+Schema
  81. Metapipe https://github.com/TorkamaniLab/metapipe
  82. OCCAM (Open Curation for Computer Architecture Modeling) https://occam.cs.pitt.edu/
  83. Copernicus http://www.copernicus-computing.org
  84. iRODS Rule Language https://github.com/samuell/irods-cheatsheets/blob/master/irods-rule-lang-full-guide.md
  85. VisTrails https://www.vistrails.org
  86. Bionode Watermill https://github.com/bionode/bionode-watermill
  87. BIOVIA Pipeline Pilot Overview http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/
  88. Dagman A meta-scheduler for HTCondor https://research.cs.wisc.edu/htcondor/dagman/dagman.html
  89. UNICORE https://www.unicore.eu/docstore/workflow-7.6.0/workflow-manual.html#wf_dialect
  90. Toil (A scalable, efficient, cross-platform and easy-to-use workflow engine in pure Python) https://github.com/BD2KGenomics/toil
  91. Cylc https://cylc.github.io/cylc/
  92. Autodesk Cloud Compute Canon https://github.com/Autodesk/cloud-compute-cannon
  93. Civet https://github.com/TheJacksonLaboratory/civet
  94. Cumulus https://github.com/Kitware/cumulus
  95. High-performance integrated virtual environment (HIVE) https://hive.biochemistry.gwu.edu
  96. Cloudgene http://cloudgene.uibk.ac.at/cloudgene-yaml
  97. FASTR https://bitbucket.org/bigr_erasmusmc/fastr/ http://fastr.readthedocs.io/en/stable/
  98. BioMake https://github.com/evoldoers/biomake http://dx.doi.org/10.1101/093245
  99. remake https://github.com/richfitz/remake
  100. SciFloware http://www-sop.inria.fr/members/Didier.Parigot/pmwiki/Scifloware/
  101. OpenAlea http://openalea.gforge.inria.fr/dokuwiki/doku.php https://hal.archives-ouvertes.fr/hal-01166298/file/openalea-PradalCohen-Boulakia.pdf
  102. COMBUSTI/O https://github.com/jarlebass/combustio http://hdl.handle.net/10037/9361
  103. BioCloud https://github.com/ccwang002/biocloud-server-kai http://doi.org/10.6342/NTU201601295
  104. Triana http://www.trianacode.org/
  105. Kepler https://kepler-project.org/
  106. Anduril http://anduril.org/site/
  107. dgsh http://www.dmst.aueb.gr/dds/sw/dgsh/
  108. EDGE bioinformatics: Empowering the Development of Genomics Expertise https://bioedge.lanl.gov/edge_ui/ http://edge.readthedocs.io/ https://lanl-bioinformatics.github.io/EDGE/
  109. Pachyderm http://pachyderm.io/ http://pachyderm.readthedocs.io/en/stable/advanced/advanced.html
  110. Digdag https://www.digdag.io/
  111. Agua / Automated Genomics Utilities Agent http://aguadev.org
  112. BioDepot Workflow Builder (BwB) https://github.com/BioDepot/BioDepot-workflow-builder https://doi.org/10.1101/099010
  113. IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses http://r3lab.uni.lu/web/imp/ https://doi.org/10.1186/s13059-016-1116-8
  114. Butler https://github.com/llevar/butler
  115. adage / yadage https://github.com/diana-hep/adage https://github.com/diana-hep/yadage
  116. HI-WAY: Execution of Scientific Workflows on Hadoop YARN https://github.com/marcbux/Hi-WAY https://openproceedings.org/2017/conf/edbt/paper-248.pdf
  117. OpenMOLE https://github.com/openmole/openmole https://www.openmole.org/ https://doi.org/10.3389/fninf.2017.00021
  118. Biopet https://github.com/biopet/biopet
  119. Nephele https://nephele.niaid.nih.gov/
  120. TOPPAS http://doi.org/10.1021/pr300187f
  121. SBpipe https://pdp10.github.io/sbpipe/ https://github.com/pdp10/sbpipe https://doi.org/10.1186/s12918-017-0423-3
  122. Dray http://dray.it/
  123. GenomeVIP https://github.com/ding-lab/GenomeVIP https://doi.org/10.1101/gr.211656.116
  124. GridSAM https://sourceforge.net/projects/gridsam/
  125. Roddy https://github.com/eilslabs/Roddy
  126. SciFlo (historical; doesn't seem to be maintained anymore) https://web.archive.org/web/20161118011409/https://sciflo.jpl.nasa.gov/SciFloWiki/FrontPage
  127. GNU Guix Workflow Language https://git.roelj.com/guix/gwl.git#gnu-guix-workflow-language-extension https://github.com/UMCUGenetics/guix-workflows/blob/master/umcu/workflows/rnaseq.scm
  128. Porcupine https://timvanmourik.github.io/Porcupine/
  129. Parsl (a Parallel Scripting Library for Python) http://parsl-project.org
  130. ECFLOW (Workflow primarily for Meteorological Applications) https://software.ecmwf.int/wiki/display/ECFLOW/ecflow+home
  131. Ophidia http://ophidia.cmcc.it/
  132. WebLicht https://weblicht.sfs.uni-tuebingen.de/
  133. GATE Cloud https://cloud.gate.ac.uk/
  134. SCIPION http://scipion.cnb.csic.es/m/home/ https://github.com/I2PC/scipion/wiki/Creating-a-Protocol
  135. Ergatis http://ergatis.sourceforge.net/
  136. TIGR "Workflow" https://sourceforge.net/projects/tigr-workflow/ http://tigr-workflow.sourceforge.net/
  137. Archivematica https://wiki.archivematica.org/Main_Page (A preservation workflow system that implements the ISO-OAIS standard using gearman/MCP)
  138. Martian http://martian-lang.org/about/
  139. BioMAJ http://genouest.github.io/biomaj/
  140. Conveyor http://conveyor.cebitec.uni-bielefeld.de (retired). https://academic.oup.com/bioinformatics/article/27/7/903/230562/Conveyor-a-workflow-engine-for-bioinformatic
  141. Biopipe http://www.biopipe.org (appears to be defunct) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC403782/
  142. Wildfire http://wildfire.bii.a-star.edu.sg/ https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-69
  143. BioWBI http://bioinformatics.hsanmartino.it/bits_library/library/00079.pdf
  144. BioWMS http://bioinformatics.hsanmartino.it/bits_library/library/00568.pdf
  145. BioMoby http://biomoby.open-bio.org/ https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-523
  146. SIBIOS http://ieeexplore.ieee.org/document/1309094/
  147. NGSANE https://github.com/BauerLab/ngsane https://academic.oup.com/bioinformatics/article/30/10/1471/266879/NGSANE-a-lightweight-production-informatics
  148. Pwrake https://github.com/misshie/Workflows https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3180464/
  149. Nesoni https://github.com/Victorian-Bioinformatics-Consortium/nesoni
  150. Skam http://skam.sourceforge.net/skam-intro.html
  151. TREVA http://bioinformatics.petermac.org/treva/ http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0095217
  152. EGene https://www.semanticscholar.org/paper/EGene-a-configurable-pipeline-generation-system-fo-Durham-Kashiwabara/4c0656195b5efcdd3aa7bdcb55fc95a957c150aa https://academic.oup.com/bioinformatics/article/30/18/2659/2475637/EuGene-PP-a-next-generation-automated-annotation
  153. WEP https://bioinformatics.cineca.it/wep/ https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-S7-S11
  154. Microbase http://www.microbasecloud.com/
  155. e-Science Central http://www.esciencecentral.co.uk/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3538293/
  156. Cyrille2 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-96
  157. PaPy https://code.google.com/archive/p/papy/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3051902/
  158. JobCenter https://github.com/yeastrc/jobcenter https://scfbm.biomedcentral.com/articles/10.1186/1751-0473-7-8
  159. CoreFlow https://www.ncbi.nlm.nih.gov/pubmed/24503186
  160. dynamic-pipeline https://code.google.com/archive/p/dynamic-pipeline/
  161. XiP http://xip.hgc.jp/wiki/en/Main_Page https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530915/
  162. Eoulsan http://www.outils.genomique.biologie.ens.fr/eoulsan/ https://www.ncbi.nlm.nih.gov/pubmed/22492314
  163. CloudDOE http://clouddoe.iis.sinica.edu.tw/
  164. BioPig https://github.com/JGI-Bioinformatics/biopig https://www.ncbi.nlm.nih.gov/pubmed/24021384
  165. SeqPig https://github.com/HadoopGenomics/SeqPig https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866557/
  166. zymake http://www-personal.umich.edu/~ebreck/code/zymake/
  167. JMS https://github.com/RUBi-ZA/JMS http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0134273
  168. CLC Genomics Workbench https://www.qiagenbioinformatics.com/products/clc-genomics-workbench/
  169. NG6 http://ng6.toulouse.inra.fr/ https://www.ncbi.nlm.nih.gov/pubmed/22958229
  170. VIBE http://www.incogen.com/vibe/
  171. WDL (Workflow Description Language) https://github.com/broadinstitute/wdl
  172. SciFlow https://github.com/kaizhang/SciFlow (not to be confused with SciFloware and SciFlo).
  173. Bioshake https://github.com/PapenfussLab/bioshake
  174. SciPipe http://scipipe.org
  175. Kapacitor / TICKscripts https://docs.influxdata.com/kapacitor/v1.3//tick/
  176. AiiDA: Automated Interactive Infrastructure and Database for Computational Science http://www.aiida.net/
  177. Reflow: a language and runtime for distributed, integrated data processing in the cloud https://github.com/grailbio/reflow
  178. Resolwe: an open source dataflow package for Django framework https://github.com/genialis/resolwe
  179. Yahoo! Pipes (historical) https://en.wikipedia.org/wiki/Yahoo!_Pipes
  180. Walrus https://github.com/fjukstad/walrus
  181. Apache Beam https://beam.apache.org/
  182. CLOSHA https://closha.kobic.re.kr/ https://www.bioexpress.re.kr/go_tutorial http://docplayer.net/19700397-Closha-manual-ver1-1-kobic-korean-bioinformation-center-kogun82-kribb-re-kr-2016-05-08-bioinformatics-workflow-management-system-in-bio-express.html https://doi.org/10.1186/s12859-018-2019-3
  183. WopMars https://github.com/aitgon/wopmars http://wopmars.readthedocs.io/
  184. flowing-clj https://github.com/stain/flowing-clj
  185. Plumbing and Graph https://github.com/plumatic/plumbing
  186. LabView http://www.ni.com/en-us/shop/labview.html
  187. MyOpenLab http://myopenlab.org/
  188. Max/MSP https://cycling74.com/products/max/
  189. NoFlo https://noflojs.org/
  190. Flowstone http://www.dsprobotics.com/flowstone.html
  191. HyperLoom https://code.it4i.cz/ADAS/loom https://code.it4i.cz/ADAS/loom
  192. Dask http://dask.pydata.org/en/latest/ https://github.com/dask/dask
  193. Stimela https://github.com/SpheMakh/Stimela https://github.com/SpheMakh/Stimela/wiki https://www.acru.ukzn.ac.za/~cosmosafari2017/wp-content/uploads/2017/02/makhathini.pdf
  194. JTracker https://jthub.co/ https://github.com/jtracker-io
  195. PipelineDog http://pipeline.dog/ https://github.com/zhouanbo/pipelinedog https://doi.org/10.1093/bioinformatics/btx759
  196. DALiuGE https://arxiv.org/abs/1702.07617 https://github.com/ICRAR/daliuge https://daliuge.readthedocs.io/
  197. Overseer https://github.com/framed-data/overseer
  198. Squonk https://squonk.it/
  199. GC3Pie https://github.com/uzh/gc3pie
  200. Fractalide https://github.com/fractalide/fractalide
  201. TOGGLe https://doi.org/10.1101/245480 http://toggle.southgreen.fr/
  202. Askalon http://www.askalon.org
  203. Eclipse ICE (The Integrated Computational Environment) https://www.eclipse.org/ice
  204. Sandia Analysis Workbench (SAW) http://www.sandia.gov/saw/
  205. dispel4py https://github.com/dispel4py/dispel4py
  206. Jobber https://pypi.python.org/pypi/Jobber/0.1.4
  207. NeatSeq-Flow http://neatseq-flow.readthedocs.io/
  208. S4M https://bitbucket.org/uqokorn/s4m_base/wiki/Home
  209. Loom http://med.stanford.edu/gbsc/loom.html https://github.com/StanfordBioinformatics/loom http://loom.readthedocs.io/en/latest/templates.html
  210. Watchdog https://doi.org/10.1186/s12859-018-2107-4 https://github.com/klugem/watchdog
  211. phpflo https://github.com/phpflo/phpflo
  212. BASTet: Berkeley Analysis and Storage Toolkit https://openmsi.nersc.gov/openmsi/client/bastet.html https://biorack.github.io/BASTet/ https://doi.org/10.1109/TVCG.2017.2744479
  213. Tavaxy: Pattern based workflow system for the bioinformatics domain http://www.tavaxy.org/
  214. Ginflow: Decentralised adaptive workflow engine https://ginflow.inria.fr/
  215. SciApps: A cloud-based platform for reproducible bioinformatics workflows https://doi.org/10.1093/bioinformatics/bty439 https://www.sciapps.org/
  216. Stoa: Script Tracking for Observational Astronomy https://github.com/petehague/Stoa
  217. Collective Knowledge (CK) framework http://cknowledge.org/
  218. QosCosGrid (QCG) http://www.qoscosgrid.org/ http://www.qoscosgrid.org/trac/qcg-broker/wiki/qcg-advanced-client%20