# Computational Workflows for biomedical data

**Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.**

**Course Structure:**

**- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.**
**- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>**
**- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.**

## Basics

**If you have not installed all required software, please do so now asap!**


**If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.**

**1. What is nf-core?**

The nf-core framework is a mean for the development of collaborative, peer-reviewed, best-practice analysis pipelines. The pipelines are written in Nextflow to be executed on most computational infrastructures. They have native support for container technologies like Docker.

**2. How many pipelines are there currently in nf-core?**

112 (03.09.2024)

**3. Are there any non-bioinformatic pipelines in nf-core?**

 spinningjenny is a pipeline for simulating the first industrial revolution

**4. Let's go back a couple of steps. What is a pipeline and what do we use it for?**

A pipeline consists of multiple processes. The output of one process becomes the input of the next.

**5. Why do you think nf-core adheres to strict guidelines?**

Reproductibility is very important in science. Adhering to those guidelines ensures a more streamlined process.

**6. What are the main features of nf-core pipelines?**

- collaborative <br>
- peer-reviewed <br>
- open source <br>
- can be executed in most infrastructures<br>
- support for container technologies <br>
- stable releases


## Let's start using the pipelines

**1. Find the nf-core pipeline used to measure differential abundance of genes**

differentialabundance

**run the pipeline in a cell** 

In [2]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)

!nextflow run nf-core/differentialabundance -profile test,docker --outdir /home/satan/Documents/computational_workflows/Day1/test4_terminal


[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/differentialabundance` [distraught_pauling] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmaster[0m
  [0;34m

due to permission issues, all other commands were succesfully run on the command line. The above example has been included as an example.

**repeat the run. What did change?**

In [None]:

!nextflow run nf-core/differentialabundance -profile test,docker --outdir /home/satan/Documents/computational_workflows/Day1/test5_terminal



The results were the same, but execution time and usage of RAM/CPU differed.
The second run was faster since the cached files were used.

**now set -resume to the command. What did change?**

In [None]:
!nextflow run nf-core/differentialabundance -profile test,docker --outdir /home/satan/Documents/computational_workflows/Day1/test6_terminal -resume

The results were the same, but execution time and usage of RAM/CPU differed.
The "resume" run was much faster.

**Check out the current directory. Next to the outdir you specified, what else has changed?**

In the working direcotry, multiple log files have been created.
Additionally in .nextflow/cache the cached files are saved.
A work directory has been created that contains all (intermediary) results

**delete the work directory and run the pipeline again using -resume. What did change?**

In [None]:
!nextflow run nf-core/differentialabundance -profile test,docker --outdir /home/satan/Documents/computational_workflows/Day1/test7_terminal -resume

**What changed?**

The warning

```x
WARN: It appears you have never run this project before -- Option `-resume` is ignored
```

appears.

Cached files cannot be detected (as they have been deleted) and the run takes longer again. (Same as the second run)

## Lets look at the results

### What is differential abundance analysis?

Differential Abundance Analysis is a method used to identify differences in the abundances of individual taxa between two or more groups.

**Give the most important plots from the report:**

![alt text](data/boxplot.png "Title")

![alt text](data/density.png "Title")