Welcome to Our "Get Started with Nextflow" Tutorial!

Hi everyone! I am a bioinformatics software engineer, and I’m delighted to share how Nextflow can transform the way you build and run bioinformatic pipelines. My goal is to encourage you to create pipelines in Nextflow, and I'll help you take them to the cloud for scalable, robust execution.

Backgroud

Details

Why Nextflow?

Nextflow is becoming incredibly popular because it simplifies the development of bioinformatic pipelines. it replaces messy, hard-to-maintain Bash, Python, or Perl scripts with a structured and scalable framework. with Nextflow, you can:

Write cleaner, more organized pipelines
Run your pipelines on your laptop, a cluster, or the cloud with minimal conversion
Automate complex workflows with ease

Complexity Concerns

Yes, Nextflow is powerful, with lots of features and plugins! If you've browsed Nextflow code on GitHub, you might have noticed they can look intimidating. This complexity can scare researchers away from adopting Nextflow.
But don't worry! Nextflow becomes simple when you understanding its basic concepts. I've gathered beginner-friendly examples, documentation, and tutorials to help you get started. My mission is to guide you through creating a Nextflow pipeline, running it, and customizing it. Once you're ready, I'll help you deploy it on the cloud for maximum scalability.

NextFlow Core Concepts

This outline introduces its core concepts - processes and channels, along with other key features to help you get started.

Processes and Channels

Details

the relationship for Processes(tasks) and Channels (joints of tasks) is illustrated here. Let's see a simple Nextflow pipeline example: Hello world code, and a Directed Acyclic Graph DAG-like format.

Processes: Represent individual tasks or steps in a pipeline (e.g., running a script or tool).
- inputs types:
  - val: Simple values (e.g., strings, numbers).
  - path: Files or directories (staged into the process’s working directory).
  - stdin, tuple, env etc
- outputs types
  - similar to inputs (e.g., val, path, stdout).
  - Use the emit option to name outputs for easier access in workflows (e.g., emit: result).
- script
  - By default, scripts are written in Bash (using triple quotes ''' or double quotes """ for multi-line strings).
  - Supports other scripting language like Python, Perl, or R.
Channels act as the "pipes" that connect processes by passing data between them.
- They enable asynchronous and parallel execution, making pipelines efficient.
- Creating Channels:
  - For simple values: Channel.of('Hello', 'World') (emits each value separately).
  - For files: Channel.fromPath('/data/some/bigfile.txt') (emits file paths).
  - For lists: Channel.of(['Hello', 'World']).flatten() (emits each item separately).
  - For a single collection: Channel.of(1, 2, 3, 4).collect() (emits [1, 2, 3, 4] as one item).
- Operating on Channels:
  - Use operators like .flatten(), .collect(), .map(), or .view() to manipulate data.
  - Focus on what channels do (pass data) rather than their type (queue or value).
- Key Rule: Always pass data to processes via channels, not raw values.
- NF Implicitly convert data to a channel for each mode. eg.
```
  process alignSequences {
    input:
      path seq
      each mode
  
    output:
      path 'result'
  
    script:
    """
      t_coffee -in $seq -mode $mode > result
    """
  }

  workflow {
    sequences = Channel.fromPath('*.fa')
    methods = ['regular', 'espresso', 'psicoffee']
  
    alignSequences(sequences, methods)
    alignSequences.out.view() // Shows 6 result files
  }
```
- some channel manipulation examples

Other Concepts

Details

Workflows: Combine processes into reusable workflows with DSL2 for cleaner, more organized pipelines.
Modules: Reuse processes across pipelines using Nextflow modules or DSL2, enabling modularity and collaboration.
Configuration: Customize pipeline behavior (e.g., memory, CPUs, or queue settings) in the nextflow.config file.
Scripting language: Nextflow automatically imports certain Groovy and Java classes, allowing direct use in scripts. For example: params { timestamp = (new Date()).getTime() } (from youtube example).

Compute environment

Details

Execution Environments: local, HPC, Cloud, Containers, Conda. Configure the execution environment in the nextflow.config file.
containers: There are many container image are ready to call it before create your own one. eg. "staphb/bcftools:1.21"

Command example

Details

nextflow run test.nf
nextflow run test.nf -with-dag flowchart.png
nextflow run main.nf -profile "awsbatch" -c /app/scripts/nextflow.config -bucket-dir s3://${bucket}/1000genomes/work

Let’s Get Started with "1000genome with AWS Fargate "!

Start your own Nextflow pipeline by following the single-sample implementation tutorial.
Examine genomic variation across populations with AWS
The Nextflow pipeline in the local.run folder is ready to run on your local PC with Docker installed.
The Nextflow pipeline in the aws.run folder is configured to run on AWS Fargate with Terraform deployment.

Step by Step On Nextflow Journey

Details

Let's make this fun and hands-on! Here's how we'll get started:

installation: Bash, Java, NextFlow, Docker, Git etc
Gather the documents: recommend to use Full Nextflow Documentation as dictionary for beginner.
Learn the Basics: go through the basic training of Hello Nextflow to understand the core concepts of Nextflow, like processes, channels, and workflows.
Get Hands-On: Write your first Nextflow pipeline to solve a real bioinformatics problem.
Scale to the Cloud: Extend your pipeline to run on cloud infrastructure for faster, more robust execution.

By the end, you’ll have a working pipeline and the confidence to build more. I’ll provide the tools and support to make your pipelines cloud-ready.

Container Practice

Details

rm local bcftools: eg. which bcftools; mv /opt/homebrew/bin/bcftools /opt/homebrew/bin/bcftools.bk
check docker deamon is on: eg. nextflow run hello -with-docker
set on nextflow.config: docker.enabled=true and runOptions = "--platform linux/amd64" if on mac
for customize Dockerfile, you have to turn on local docker deamon and login dockerHub, and then build push image to your public/private repo
- build your local image. eg. docker build -t 1000genome/pca-python:3.9 .
- push to dockerhub. eg. docker tag 1000genome/pca-python:3.9 <your_dockerhub_username>/pca-python:3.9; docker push <your_dockerhub_username>/pca-python:3.9

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
aws.run		aws.run
local.run		local.run
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to Our "Get Started with Nextflow" Tutorial!

Backgroud

Why Nextflow?

Complexity Concerns

NextFlow Core Concepts

Processes and Channels

Other Concepts

Compute environment

Command example

Let’s Get Started with "1000genome with AWS Fargate "!

Step by Step On Nextflow Journey

Container Practice

AWS practice

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

csiro/1000genome

Folders and files

Latest commit

History

Repository files navigation

Welcome to Our "Get Started with Nextflow" Tutorial!

Backgroud

Why Nextflow?

Complexity Concerns

NextFlow Core Concepts

Processes and Channels

Other Concepts

Compute environment

Command example

Let’s Get Started with "1000genome with AWS Fargate "!

Step by Step On Nextflow Journey

Container Practice

AWS practice

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages