# Python Workflows

---

#### Best

- Python workflow package, validation, logging

## `snakemake` Introduction

---

## snakemake

- [2024 snakemake tutorial slides](https://slides.com/johanneskoester/snakemake-tutorial)


- [Tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html#)
    - [Start the demo](https://snakemake.readthedocs.io/en/stable/tutorial/setup.html#run-tutorial-for-free-in-the-cloud-via-gitpod)



- [Best Practices](https://snakemake.readthedocs.io/en/stable/snakefiles/best_practices.html)

We Would Like a Tool That

- can figure out how to run a whole workflow based on a set of rules for transforming one file type to another 
- is reproducible
- reruns steps if necessary (input files change or processing steps change)
- runs any necessary steps automatically as new data is added

### Snakemake Is a Rule Based Dependency Tracker

#### Rules

Rules describe how to transform one file type into another. Files are identified based on constant parts of their name (e.g. .fastq, _fastqc.zip, ...)

#### Dependancy Tracker

Snakemake automatically determines what files are needed to produce a certain file type based on the rules. This information is used to calculate a dependency tree for the whole workflow. Rules are only executed if their outputs either don't exist or are older than the input files.

There Are Many Such Tools
- make, ninja, scons, waf, ruffus, jug,
- Rake, bpipe, paver, Galaxy, ...

#### So Why Use `snakemake`?

- Snakefiles are python code - i.e. a real programming language is available
- designed with bioinformatics in mind
- easy to offload processes to cluster nodes
- advanced pattern matching
- multiple input and output files
- many bonus features: configuration, wrappers,
- target lists, graphs of workflow, reports, ...
- keeps track of code changes in rules

Chair of Bioinformatics Research Group, Boku University Vienna, Austria
"We have initially tested several systems, including, Bpipe [9], Moa [https://github.com/mfiers/Moa], Ruffus [28], and Snakemake [10]. We have since focused on exploring Snakemake due to, among other features, __its make-like workflow definition, simple integration with Python, Bash code portability, ease of porting workflows to a cluster, intuitive parallelization, and ongoing active development__. We are currently working on extending Snakemake with a lightweight modular system for development cycle control and policy-based specification of rules and requirements that supports an in-flow enforcement of consistency constraints. We have developed and validated a proof-of-concept prototype of the mechanism and automated the code generation of rules." [Source](https://biologydirect.biomedcentral.com/articles/10.1186/s13062-015-0071-8)

It is almost impossible to cover all available workflow management systems, but I can give you a few points: 
* compared to GNU Make, Snakemake is more flexible, supporting e.g. multiple output files, Python scripting, conditional inputs, arbitrary resource constraints, cluster execution with DRMAA. 
* Snakefiles can look very clean and readable (almost self documenting). 
* Snakemake allows you to easily separate workflow logic from analysis logic (in the form of e.g. external scripts), without having to write boilerplate code. This further supports readability and also code re-use. 
* We are building a library of Snakemake wrappers around popular bioinformatics tools. 
* Soon, Snakemake will support the automatic installation of the software dependencies of your workflow e.g. via Bioconda. This allows to deploy and execute a workflow on a new machine in a single step.
[Johannes Köster](https://groups.google.com/forum/#!topic/snakemake/X_sGS6EiY-M)

## Resources:

__`snakemake`__

- Reference: https://academic.oup.com/bioinformatics/article/28/19/2520/290322/Snakemake-a-scalable-bioinformatics-workflow
- http://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html#tutorial
- [2024 snakemake tutorial slides](https://slides.com/johanneskoester/snakemake-tutorial)
- Jeremy Leipzig, _[A review of bioinformaics pipeline frameworks](https://doi.org/10.1093/bib/bbw020)_. Briefings in Bioinformatics. 2016 
- Johannes Köster  Sven Rahmann, - [Snakemake—a scalable bioinformatics workflow engine](https://academic.oup.com/bioinformatics/article/28/19/2520/290322) Bioinformatics, Volume 28, Issue 19, 1 October 2012
- [snakemake wrappers](https://snakemake-wrappers.readthedocs.io/en/stable/index.html)
- [snakemake RNASeq example](https://github.com/seandavi/SnakemakeRNASeqExample)

- [Awesome Pipeline](https://github.com/pditommaso/awesome-pipeline) GitHub page