---
title: "Creating Automated and Reproducible Pipelines with Nextflow"
author: "Kanishka Manna"
format: revealjs
editor: visual
---


# Fundamentals of Pipeline

## 

Data science pipelines are just like cooking & making Tacos!

![This image is generated by A.I.](images/data_science_cooking.png){fig-alt="This image is generated by AI"}

## What is a Pipeline ?

::: incremental
-   A **pipeline** is an automated sequence of steps that process data, such as collection, cleaning, transformation, analysis, and storage.

-   It involves multiple software packages, often in different environments, and scripts in various programming languages.
:::

## Is it Pipeline or Workflow?

::: incremental
-   A **workflow** is a broader term that encompasses any sequence of tasks or processes

-   A **pipeline** refers to a specific type of workflow focused on processing and transforming data in a linear or sequential manner.
:::

## A Traditional Bioinformatics Pipeline

![Pipeline for Variant Calling.](images/traditional_pipeline.png)

# Workflow Management Systems (WFMS)

## 

**Workflow Management Systems** (WFMS) such as Apache Airflow, Quickbase, Cuneiform, Common Workflow Language, Galaxy, Snakemake, Nextflow are designed to manage computational workflows in fields such as bioinformatics or data science.

-   They streamline tasks by defining dependencies and automating processes across different computational environments.

## A Traditional Pipeline Wrapped by Scientific WFMS

![Pipeline for Variant Calling wrapped within a Workflow Manager](images/pipeline_workflow_manager.png)

## Key features of WFMS include:

::: incremental
-   Run time management

-   Software management

-   Portability & interoperability

-   Reproducibility

-   Reentrancy
:::

## 

[![](images/Nextflow_logo.png){fig-alt="Nextflow" fig-align="center"}](https://www.nextflow.io/)

**Nextflow** is a scientific workflow framework that can be used by a bioinformatician or data scientist to integrate all of their bash/python/Perl/other scripts into a one cohesive pipeline that are portable, reproducible, scalable and check-pointed.

<https://www.nextflow.io/>

## Nextflow core features

[![Some of the features of nextflow ..](images/nextflow_features.png)](https://www.nextflow.io/)

##  {.scrollable}

::::: columns
::: {.column width="50%"}
![](images/Nextflow_logo.png)

-   Groovy-based, flexible, supports complex logic

-   Highly scalable, supports cloud, HPC, and containers (Docker/Singularity)

-   Emphasizes portability with container integration

-   Seqera & growing community, supports multi-platform workflows
:::

::: {.column width="50%"}
![](images/snakemake_logo.png)

-   Python-based, simple, rule-driven

-   Local, clusters, and cloud with built-in scheduler support

-   Strong focus on reproducibility and environment consistency

-   Large bioinformatics community
:::
:::::

# Installation & Setup

## Installing Nextflow

## Setting up our Workshop Directory

Each workshop attendees should setup a training folder e.g. `nf_workshop`


```{bash}
mkdir nf_workshop
cd nf_workshop
```


## Downloading Workshop Materials

The scripts and materials for this workshop can be found in the \[Nevada Bioinformatics Center\](<https://github.com/Nevada-Bioinformatics-Center/NBC_nextflow_workshop>).

To download, type in the series of commands in your terminal/console.


```{bash}
# get the git repository
git clone https://github.com/Nevada-Bioinformatics-Center/NBC_nextflow_workshop.git
```


# Getting Started

## Nextflow 101

## Your first script

```         
#!/usr/bin/env nextflow

// Enable DSL 2 syntax
nextflow.enable.dsl = 2

/*
 * A Simple Nextflow script to print 'Hello, welcome to the NBC Nextflow Workshop!'
 */

// Define a process called 'greetings' that will output to standard output
 process greetings {
    
     output:
        stdout
    
    // The script section that prints the statement
     script:
    """
    echo 'Hello, welcome to the NBC Nextflow Workshop!'
    """
 }


// Workflow definition: Calls the 'greetings' process
 workflow  {

    greetings()
     
 }
```

## Let's break down our simple code .. {.scrollable}

-   The first line `!/usr/bin/env nextflow` is known as the *'Shebang'* line, specifying the location of the nextflow interpreter.
-   `nextflow.enable.dsl = 2` is used to enable DSL2 syntax.
-   Note: A line comment in nextflow starts with `//` and a comment block `/* .. */`.

## Let's break down our simple code .. {.scrollable}

1.  Defining the process:

-   The `greetings` process will run the command \`echo 'Hello, welcome to the NBC Nextflow Workshop!'

-   Note: This process does not generate an output file but simply prints the above statement to the console.

```         
// Define a process called 'greetings' that will output to standard output
 process greetings {
    
     output:
        stdout
    
    // The script section that prints the statement
     script:
    """
    echo 'Hello, welcome to the NBC Nextflow Workshop!'
    """
 }
```

## Let's break down our simple code

2.  Creating the workflow:

-   The `workflow` block runs the `greetings` process.

-   This is where the execution order of processes is defined. Since, it is the only process, it will run once.

```         
// Workflow definition: Calls the 'greetings' process
 workflow  {

    greetings()
     
 }
```