Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs update #76

Merged
merged 36 commits into from Aug 28, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
ba46e67
added figures to pipelines in examples/pipelines/tests
tiagofilipe12 Aug 23, 2017
49354f4
edited task.md documentation
tiagofilipe12 Aug 23, 2017
5cdbcde
edited streamable tasks related documentation
tiagofilipe12 Aug 23, 2017
97c9293
added more information on I/O and shell commands within operationCreator
tiagofilipe12 Aug 23, 2017
f66cb72
updated orchestration docs
tiagofilipe12 Aug 24, 2017
f97830c
updated documentation on shell commands execution
tiagofilipe12 Aug 24, 2017
93d46a8
improved beginner's walkthrough
tiagofilipe12 Aug 24, 2017
a6dc7b9
removed unnecessary grep
tiagofilipe12 Aug 24, 2017
ec4fd8a
added a minor indententation fix
tiagofilipe12 Aug 24, 2017
52bc651
added a multi-sample two-mappers example
tiagofilipe12 Aug 24, 2017
c450ee7
edited badge for dev
tiagofilipe12 Aug 24, 2017
6920089
main readme updated as well as documentation improved
tiagofilipe12 Aug 25, 2017
9cfe06e
fixed error in main readme
tiagofilipe12 Aug 25, 2017
50089a5
ammend to pipeline
tiagofilipe12 Aug 25, 2017
bd0ada0
edited multiinput pipeline example for two-mappers
tiagofilipe12 Aug 25, 2017
e4ccd43
added visualization tool api description and usage
tiagofilipe12 Aug 25, 2017
62b00b2
Added multiple input handling
tiagofilipe12 Aug 25, 2017
297c321
Added description on uid API
tiagofilipe12 Aug 25, 2017
8343496
added an example pipeline to docs
tiagofilipe12 Aug 25, 2017
3749f14
added forkception to docs and its current known limitation
tiagofilipe12 Aug 25, 2017
7cd0db9
added desdcription on resolution of input and changed streamable task…
tiagofilipe12 Aug 25, 2017
3fb094e
fixed a typo
tiagofilipe12 Aug 25, 2017
385498c
amended the explanation of collection
tiagofilipe12 Aug 25, 2017
7c2d33d
fixed review comments
tiagofilipe12 Aug 25, 2017
7679b3c
added pipeline contributions to main readme
tiagofilipe12 Aug 25, 2017
4ead347
Corrected gh naming
tiagofilipe12 Aug 25, 2017
d0931bd
added npm badge
tiagofilipe12 Aug 25, 2017
e8e167e
updated docs to have streams in feature list
tiagofilipe12 Aug 27, 2017
5d610e2
updated multiple input pipeline for two-mappers
tiagofilipe12 Aug 27, 2017
034af26
fixed formatting on docs
tiagofilipe12 Aug 27, 2017
5c7a63b
added some minor changes to through task description and test
tiagofilipe12 Aug 27, 2017
9859712
fixed input patterns from current working directory not being properl…
tiagofilipe12 Aug 28, 2017
974d58d
added definitive multi input pipeline for two-mappers pipeline.js
tiagofilipe12 Aug 28, 2017
4e6f589
uptaded docs regarding multiple input api changes
tiagofilipe12 Aug 28, 2017
48c0066
added headers to begginer's walkthrough
tiagofilipe12 Aug 28, 2017
4593a4c
Merge branch 'dev' into docs_update
tiagofilipe12 Aug 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
150 changes: 87 additions & 63 deletions README.md
@@ -1,101 +1,125 @@
<p align="center">
<a href="http://bionode.io">
<img height="200" width="200" title="bionode" alt="bionode logo" src="https://rawgithub.com/bionode/bionode/master/docs/bionode-logo.min.svg"/>
</a>
<br/>
<a href="http://bionode.io/">bionode.io</a>
</p>

# bionode-watermill

[![npm version](https://badge.fury.io/js/bionode-watermill.svg)](https://badge.fury.io/js/bionode-watermill) [![node](https://img.shields.io/badge/node-v6.x-blue.svg)]() [![Build Status](https://travis-ci.org/bionode/bionode-watermill.svg?branch=master)](https://travis-ci.org/bionode/bionode-watermill) [![codecov.io](https://codecov.io/github/bionode/bionode-watermill/coverage.svg?branch=master)](https://codecov.io/github/bionode/bionode-watermill?branch=master)
> Bionode-watermill: A (Not Yet Streaming) Workflow Engine

*Watermill: A Streaming Workflow Engine*
[![npm version](https://badge.fury.io/js/bionode-watermill.svg)](https://badge.fury.io/js/bionode-watermill)
[![node](https://img.shields.io/badge/node-v6.x-blue.svg)]()
[![Build Status](https://travis-ci.org/bionode/bionode-watermill.svg?branch=dev)](https://travis-ci.org/bionode/bionode-watermill)
[![codecov.io](https://codecov.io/github/bionode/bionode-watermill/coverage.svg?branch=master)](https://codecov.io/github/bionode/bionode-watermill?branch=master)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/bionode/bionode-watermill)

[![NPM](https://nodei.co/npm/bionode-watermill.png?downloads=true&stars=true)](https://nodei.co/npm/bionode-watermill/)

## Table of Contents

* [What is bionode-watermill](#what-is-bionode-watermill)
* [Main features](#main-features)
* [Who is this tool for?](#who-is-this-tool-for)
* [Installation](#installation)
* [Documentation](#documentation)
* [Tutorial](#tutorial)
* [Example pipelines](#example-pipelines)
* [Why bionode-watermill?](#why-bionode-watermill)
* [Contributing](#contributing)




- [CWL?](#cwl)
- [What is a task?](#what-is-a-task)
- [What are orchestrators?](#what-are-orchestrators)
- [Check out bionode-watermill tutorial!](#check-out-bionode-watermill-tutorial)
- [Example pipelines](#example-pipelines)
- [Why bionode-watermill?](#why-bionode-watermill)
- [Who is this tool for?](#who-is-this-tool-for)
## What is bionode-watermill

Watermill lets you *orchestrate* **tasks** using operators like **join**, **junction**, and **fork**. Each task has a [lifecycle](https://thejmazz.gitbooks.io/bionode-watermill/content/TaskLifecycle.html) where
**Bionode-watermill** is a workflow engine that lets you assemble and run
bioinformatic pipelines with ease and less overhead. Bionode-watermill
pipelines are
essentially node.js scripts in which [tasks](docs/BeginnerWalkthrough.md#task) are the modules that will be
assembled in the final *pipeline* using [orchestrators](docs/BeginnerWalkthrough.md#orchestrators).

1. Input [glob patterns](https://github.com/isaacs/node-glob) are resolved to absolute file paths (e.g. `*.bam` to `reads.bam`)
2. The **operation** is ran, passed resolved input, params, and other props
3. The operation completes.
4. Output glob patterns are resolved to absolute file paths.
5. Validators are ran over the output. Check for non-null files, can pass in custom validators.
6. Post-validations are ran. Add task and output to DAG.
### Main features

## CWL?
* Modularity
* Reusability
* Automated Input/Output handling
* Ability to run programs using Unix shell
* Node.js integration
* [Streamable tasks](docs/Task.md#streamable-tasks-potential) (still not
implemented - Issue [#79](https://github.com/bionode/bionode-watermill/issues/79))

Coming soon.
### Who is this tool for?

## What is a task?
Bionode-watermill is for **biologists** who understand it is important to
experiment with sample data, parameter values, and tools. Compared to other
workflow systems, the ease of swapping around parameters and tools is much
improved, allowing you to iteratively compare results and construct more
confident inferences. Consider the ability to construct your own
[Teaser](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0803-1)
for *your data* with a *simple syntax*, and getting utmost performance out of the box.

A `task` is the fundamental unit pipelines are built with. For more details, see [Task](https://thejmazz.gitbooks.io/bionode-watermill/content/Task.html). At a glance, a task is created by passing in **props** and an **operationCreator**, which will later be called with the resolved input. Consider this task which takes a "lowercase" file and creates an "uppercase" one:

```javascript
const uppercase = task({
input: '*.lowercase',
output: '*.uppercase'
}, function(resolvedProps) {
const input = resolvedProps.input
Bionode-watermill is for **programmers** who desire an efficient and
easy-to-write methodology for developing complex and dynamic data pipelines,
while handling parallelization as much as possible. Bionode-watermill is an npm
module, and is accessible by anyone willing to learn a little JavaScript. This
is in contrast to other tools which develop their own DSL
(domain specific language), which is not useful outside the tool. By leveraging
the npm ecosystem and JavaScript on the client, Bionode-watermill can be built
upon for inclusion on web apis, modern web applications, as well as native
applications through [Electron](http://electron.atom.io/). Look forward to
seeing Galaxy-like applications backed by a completely configurable Node API.

return fs.createReadStream(input)
.pipe(through(function(chunk, enc, next) {
next(null, chunk.toString().toUpperCase())
})
.pipe(fs.createWriteStream(input.replace(/lowercase$/, 'uppercase')))
})
```

A "task declaration" like above will not immediately run the task. Instead, the task declaration returns an "invocable task" that can either be called directly or used with an orchestration operator. Tasks can also be created to **run shell programs**:
## Installation

```javascript
const fastqDump = task({
input: '**/*.sra',
output: [1, 2].map(n => `*_${n}.fastq.gz`),
name: 'fastq-dump **/*.sra'
}, ({ input }) => `fastq-dump --split-files --skip-technical --gzip ${input}` )
```
Local installation:

## What are orchestrators?
```npm install bionode-watermill```

Orchestrators are functions which can take tasks as params in order to let you compose your pipeline from a high level view. This **separates task order from task declaration**. For more details, see [Orchestration](https://thejmazz.gitbooks.io/bionode-watermill/content/Orchestration.html). At a glance, here is a complex usage of `join`, `junction`, and `fork`:
Global installation:

```javascript
const pipeline = join(
junction(
join(getReference, bwaIndex),
join(getSamples, fastqDump)
),
trim, mergeTrimEnds,
decompressReference, // only b/c mpileup did not like fna.gz
join(
fork(filterKMC, filterKHMER),
alignAndSort, samtoolsIndex, mpileupAndCall // 2 instances each of these
)
)
```
```npm install bionode-watermill -g```

## Check out bionode-watermill tutorial!
## Documentation

- [Try out bionode-watermill tutorial](https://github.com/bionode/bionode-watermill-tutorial)
Our documentation is available [here](https://thejmazz.gitbooks.io/bionode-watermill/content/).
There you may find how to **use** bionode-watermill to construct and **run**
your
pipelines. Moreover, you will also find the description of the API to help
anyone
willing to **contribute**.


## Tutorial

- [Try bionode-watermill tutorial!](https://github.com/bionode/bionode-watermill-tutorial)

## Example pipelines

- [Toy pipeline with shell/node](https://github.com/bionode/bionode-watermill/blob/master/examples/pipelines/pids/pipeline.js)
- [Simple capitalize task](https://github.com/bionode/bionode-watermill/blob/master/examples/pipelines/capitalize/capitalize.js)
- [Simple SNP calling](https://github.com/bionode/bionode-watermill/blob/master/examples/pipelines/variant-calling-simple/pipeline.js)
- [SNP calling with filtering and fork](https://github.com/bionode/bionode-watermill/blob/master/examples/pipelines/variant-calling-filtered/pipeline.js)
- [Mapping with bowtie2 and bwa](https://github.com/bionode/bionode-watermill/tree/master/examples/pipelines/two-mappers)
- [Mapping with bowtie2 and bwa (with tutorial)](https://github.com/bionode/bionode-watermill/tree/master/examples/pipelines/two-mappers)

## Why bionode-watermill?

[This blog post](https://jmazz.me/blog/NGS-Workflows)
compares the available tools to deal with NGS workflows, explaining the
advantages of each one, including **bionode-watermill**.

## Who is this tool for?

Bionode-watermill is for **programmers** who desire an efficient and easy-to-write methodology for developing complex and dynamic data pipelines, while handling parallelization as much as possible. Bionode-watermill is an npm module, and is accessible by anyone willing to learn a little JavaScript. This is in contrast to other tools which develop their own DSL (domain specific language), which is not useful outside the tool. By leveraging the npm ecosystem and JavaScript on the client, Bionode-watermill can be built upon for inclusion on web apis, modern web applications, as well as native applications through [Electron](http://electron.atom.io/). Look forward to seeing Galaxy-like applications backed by a completely configurable Node API.
## Contributing
We welcome all kinds of contributions at all levels of experience, please
refer to
the [Issues section](https://github.com/bionode/bionode-watermill/issues).
Also, you can allways reach us on [gitter](https://gitter.im/bionode/bionode-watermill).

### Feel free to submit your pipeline to us

Bionode-watermill is for **biologists** who understand it is important to experiment with sample data, parameter values, and tools. Compared to other workflow systems, the ease of swapping around parameters and tools is much improved, allowing you to iteratively compare results and construct more confident inferences. Consider the ability to construct your own [Teaser](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0803-1) for *your data* with a *simple syntax*, and getting utmost performance out of the box.
Just make a PR for us, that adds a pipeline under `./examples/pipelines/`.
You can check some of the already existing examples [here](examples/pipelines).