Skip to content

ajodeh-juma/ngs-academy-africa-nfcore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This lesson is an introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Nextflow enables scalable and reproducible scientific workflows using software containers such as Docker and Singularity. It allows the adaptation of pipelines written in the most common scripting languages such as R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.

Login

From the terminal of your local computer, you can log into the HPC using the following command line, followed by pressing . You will be prompted to type in your password. On a Linux system, you can use the Ctrl-Alt-T keyboard shortcut to open a terminal.

ssh <train11>@172.16.13.171

If using PuTTY, type 172.16.13.171 in the Host Name (or IP address) field and open the program. Login with user name when prompted and key in your password.

Prep

In your home directory, follow the steps:

Clone the repo in your home directory git clone https://github.com/ajodeh-juma/ngs-academy-africa-nfcore.git

Your first script

  1. Open your first nextflow script wc.nf using your favourite text editor (nano or vim)

  2. Run the script using nextflow

    nextflow run wc.nf
    
  3. Create a process in the script to print the number of reads in the input file provided. Ensure that you capture the output as stdout

    Quiz: How many reads are in the input file?


    Answer

Simple RNA-Seq pipeline

Using Conda/Bioconda

  1. Create a conda environment

    conda env create -f environment.yaml
    

    The environment.yml file has all the required tools/software and dependencies for the simple pipleine that we will run. (You can have a preview of the file)

  2. Activate the conda environment

    conda activate rnaseq-env
    
  3. Run the script using conda profile

    nextflow run main.nf -profile conda
    
  4. In the environment.yaml file, add a dependency fastp and update the conda environment by using the command:

    conda env update -f environment.yaml
    

Exercise

  1. In your workflow:

    (a). Add a process that preprocesses the raw reads using fastp and use the preprocessed reads as input for the quantification step with salmon.

    (b). As output(s), emit the channels: .json, .html and the .log files as outputs.

    (c). Use the .json outputs as input for in the multiqc process to summarize and visualize.

    (d). Add a process that counts the number of reads before (raw reads) and after (preprocessed reads). Print the output(s) in stdout.

Using Docker

  1. Deactivate the conda environment

    conda deactivate
    
  2. Build a Docker image

    docker build -t rnaseq-image .
    

    This may take a couple of minutes

  3. Test container by looking at the Salmon version

    docker run rnaseq-image salmon --version
    

    Exercise

    1. Mount the parent directory identical to the container using the flag -v or --volume and generate the genome index using salmon by running the container in interactive mode.

    2. Run the script using docker

      nextflow run main.nf -with-docker rnaseq-image
      

About

Advanced Training in Bioinformatics Workflows: Beginner - Intermediate level Bioinformaticians working with next-generation sequencing data in East-Africa.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published