GitHub - ajodeh-juma/ngs-academy-africa-nfcore: Advanced Training in Bioinformatics Workflows: Beginner - Intermediate level Bioinformaticians working with next-generation sequencing data in East-Africa.

Introduction
Login
Prep
Your First script
Simple RNA-Seq pipeline

Introduction

This lesson is an introduction to the workflow manager Nextflow, and nf-core, a community effort to collect a curated set of analysis pipelines built using Nextflow.

Nextflow enables scalable and reproducible scientific workflows using software containers such as Docker and Singularity. It allows the adaptation of pipelines written in the most common scripting languages such as R and Python. Nextflow is a Domain Specific Language (DSL) that simplifies the implementation and the deployment of complex parallel and reactive workflows on clouds and clusters.

This lesson motivates the use of Nextflow and nf-core as a development tool for building and sharing computational pipelines that facilitate reproducible (data) science workflows.

Login

From the terminal of your local computer, you can log into the HPC using the following command line, followed by pressing . You will be prompted to type in your password. On a Linux system, you can use the Ctrl-Alt-T keyboard shortcut to open a terminal.

ssh <train11>@172.16.13.171

If using PuTTY, type 172.16.13.171 in the Host Name (or IP address) field and open the program. Login with user name when prompted and key in your password.

Prep

In your home directory, follow the steps:

Clone the repo in your home directory git clone https://github.com/ajodeh-juma/ngs-academy-africa-nfcore.git

Your first script

Open your first nextflow script wc.nf using your favourite text editor (nano or vim)
Run the script using nextflow
```
nextflow run wc.nf
```
Create a process in the script to print the number of reads in the input file provided. Ensure that you capture the output as stdout

Quiz: How many reads are in the input file?

Answer

Simple RNA-Seq pipeline

Using Conda/Bioconda

Create a conda environment
```
conda env create -f environment.yaml
```
The environment.yml file has all the required tools/software and dependencies for the simple pipleine that we will run. (You can have a preview of the file)
Activate the conda environment
```
conda activate rnaseq-env
```
Run the script using conda profile
```
nextflow run main.nf -profile conda
```
In the environment.yaml file, add a dependency fastp and update the conda environment by using the command:
```
conda env update -f environment.yaml
```

Exercise

In your workflow:

(a). Add a process that preprocesses the raw reads using fastp and use the preprocessed reads as input for the quantification step with salmon.

(b). As output(s), emit the channels: .json, .html and the .log files as outputs.

(c). Use the .json outputs as input for in the multiqc process to summarize and visualize.

(d). Add a process that counts the number of reads before (raw reads) and after (preprocessed reads). Print the output(s) in stdout.

Using Docker

Deactivate the conda environment
```
conda deactivate
```
Build a Docker image
```
docker build -t rnaseq-image .
```
This may take a couple of minutes
Test container by looking at the Salmon version
```
docker run rnaseq-image salmon --version
```
Exercise
1. Mount the parent directory identical to the container using the flag -v or --volume and generate the genome index using salmon by running the container in interactive mode.
2. Run the script using docker
```
nextflow run main.nf -with-docker rnaseq-image
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

environment.yaml

environment.yaml

main.nf

main.nf

wc.nf

wc.nf

Repository files navigation

Introduction

Login

Prep

Your first script

Simple RNA-Seq pipeline

Using Conda/Bioconda

Exercise

Using Docker

Exercise

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
environment.yaml		environment.yaml
main.nf		main.nf
wc.nf		wc.nf

ajodeh-juma/ngs-academy-africa-nfcore

Folders and files

Latest commit

History

Repository files navigation

Introduction

Login

Prep

Your first script

Simple RNA-Seq pipeline

Using Conda/Bioconda

Exercise

Using Docker

Exercise

About

Topics

Resources

Stars

Watchers

Forks

Languages