In this section, we are going to use an [example](https://www.nextflow.io/example4.html) provided by [NextFlow](https://www.nextflow.io/), in which they have described a workflow for RNA-seq.

# Installation

In [None]:
! curl -fsSL get.nextflow.io | bash

# RNA-Seq pipeline

This example shows how to put together a basic RNA-Seq pipeline. It maps a collection of read-pairs to a given reference genome and outputs the respective transcript model.

***rna-seq.nf***

```groovy
#!/usr/bin/env nextflow

/*
 * The following pipeline parameters specify the reference genomes
 * and read pairs and can be provided as command line options
 */
params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq"
params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
params.outdir = "results"

workflow {
    read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )

    INDEX(params.transcriptome)
    FASTQC(read_pairs_ch)
    QUANT(INDEX.out, read_pairs_ch)
}

process INDEX {
    tag "$transcriptome.simpleName"

    input:
    path transcriptome

    output:
    path 'index'

    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}

process FASTQC {
    tag "FASTQC on $sample_id"
    publishDir params.outdir

    input:
    tuple val(sample_id), path(reads)

    output:
    path "fastqc_${sample_id}_logs"

    script:
    """
    fastqc.sh "$sample_id" "$reads"
    """
}

process QUANT {
    tag "$pair_id"
    publishDir params.outdir

    input:
    path index
    tuple val(pair_id), path(reads)

    output:
    path pair_id

    script:
    """
    salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
    """
}
```

## Let's dig deeper and see some exlainations

### Pipeline Parameters:
```groovy
params.reads = "$baseDir/data/ggal/ggal_gut_{1,2}.fq"
params.transcriptome = "$baseDir/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
params.outdir = "results"
```

- `params.reads:` Specifies the input read pairs. The {1,2} in the path indicates that there are two read files per sample, denoted by {1} and {2}.
- `params.transcriptome:` Specifies the reference transcriptome file.
- `params.outdir:` Specifies the output directory where results will be saved.

### Workflow Definition:

```groovy
workflow {
    read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )

    INDEX(params.transcriptome)
    FASTQC(read_pairs_ch)
    QUANT(INDEX.out, read_pairs_ch)
}
```

- `workflow:` Defines the workflow of the pipeline.
- `read_pairs_ch:` Creates a channel of file pairs from the read input files.
- `INDEX, FASTQC, QUANT:` These are the processes in the workflow. They are executed sequentially.
  - `INDEX:` Indexes the reference transcriptome.
  - `FASTQC:` Performs quality control on the input read pairs.
  - `QUANT:` Quantifies gene expression using the indexed transcriptome and the input read pairs.

### Process Definitions:

#### 1. Index Process

```groovy
process INDEX {
    tag "$transcriptome.simpleName"

    input:
    path transcriptome

    output:
    path 'index'

    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}
```

`tag:` Tags the process with the basename of the transcriptome file.

`input:` Defines the input file path for the transcriptome.

`output:` Defines the output directory for the indexed transcriptome.

`script:` Executes the salmon index command to index the transcriptome.

### 2. FASTQC Process:

```groovy
process FASTQC {
    tag "FASTQC on $sample_id"
    publishDir params.outdir

    input:
    tuple val(sample_id), path(reads)

    output:
    path "fastqc_${sample_id}_logs"

    script:
    """
    fastqc.sh "$sample_id" "$reads"
    """
}

```

- `tag:` Tags the process with a description including the sample ID.
- `publishDir:` Specifies the directory where the output files will be saved.
- `input:` Defines a tuple containing the sample ID and the path to the read file.
- `output:` Defines the output directory for the FASTQC logs.
- `script:` Executes the fastqc.sh script for performing FASTQC analysis on the read file.

### 3. QUANT Process:

```groovy
process QUANT {
    tag "$pair_id"
    publishDir params.outdir

    input:
    path index
    tuple val(pair_id), path(reads)

    output:
    path pair_id

    script:
    """
    salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
    """
}
```

`tag:` Tags the process with the pair ID.

`publishDir:` Specifies the directory where the output files will be saved.

`input:` Defines the input paths for the index and the read pair.

`output:` Defines the output directory for the quantification results.

`script:` Executes the salmon quant command to quantify gene expression using the indexed transcriptome and the input read pairs.

# Try it in your computer

Instead of runnung this `.nf` file, we can automatically download the pipeline [GitHub repository](https://github.com/nextflow-io/rnaseq-nf) and the associated `Docker` images and then run the pipeline. This command below does these steps:

```bash
./nextflow run rnaseq-nf -with-docker
```