# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

It's a joint global project aiming to put together a selection of open-source analysis pipelines, all developed using Nextflow  [[Source]](https://nf-co.re/).

2. How many pipelines are there currently in nf-core?

There are 112 pipelines that are currently available as part of nf-core (66 released, 33 under development, 13 archived) [[Source]](https://nf-co.re/).

3. Are there any non-bioinformatic pipelines in nf-core?

Yes, for example `nf-core/meerpipe`, which is an astronomy pipeline [[Source]](https://nf-co.re/meerpipe/dev/).

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

A pipeline is a structured sequence of processes designed to carry out data analyses systematically. Pipelines are used to automate repetitive tasks and to make data analyses reproducible.

5. Why do you think nf-core adheres to strict guidelines?

Since the nf-core pipelines are a collaborative work of many different people, strict guidelines are necessary to ensure that the pipelines are well-structured and thus easy to understand for anyone. They also guarantee reproducibility by requiring packaged software, stable releases, and extensive documentation.

6. What are the main features of nf-core pipelines?

- Extensive Documentation
- Continuous-Integration Testing
- Stable Releases
- Open Source
- Packaged Software 
- Run anywhere

[[Source]](https://nf-co.re/)

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

In [1]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd

# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!

/mnt/c/Users/julia/Documents/Uni/Master/Semester_2/CompWorkflows


In [3]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)

!nextflow run nf-core/differentialabundance -profile test,docker --outdir './results'

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/differentialabundance` [thirsty_pauling] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmaster[

In [4]:
# repeat the run. What did change?
!nextflow run nf-core/differentialabundance -profile test,docker --outdir './results'

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/differentialabundance` [modest_lamarck] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmaster[0

Instead of 12 minutes, the run only took 4 minutes.

In [5]:
# now set -resume to the command. What did change?
!nextflow run nf-core/differentialabundance -profile test,docker --outdir './results' -resume

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/differentialabundance` [scruffy_visvesvaraya] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmas

Cached processes were re-used, which resulted in an even shorter run time (less than 1 minute).

Check out the current directory. Next to the outdir you specified, what else has changed?

Several directories have been created (`./.nextflow`, `./null`, `./work`). Also, a `.nextflow.log` file was created.

In [6]:
# delete the work directory and run the pipeline again using -resume. What did change?
!nextflow run nf-core/differentialabundance -profile test,docker --outdir './results' -resume

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.1
Launching `https://github.com/nf-core/differentialabundance` [tender_baekeland] DSL2 - revision: 3dd360fed0 [master]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32mmaster

What changed?

The run took 5 minutes again because no cached task executions could be re-used.

## Lets look at the results

### What is differential abundance analysis?

Differential abundance analysis is used to test for differential abundance of specific genes among experimental conditions. This can be helpful to identify specific cell state changes associated with disease or treatment.


Source:
Dann, E., Henderson, N.C., Teichmann, S.A. et al. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nat Biotechnol 40, 245–253 (2022). https://doi.org/10.1038/s41587-021-01033-z

Give the most important plots from the report:

![image info](./results/plots/differential/treatment_mCherry_hND6_/png/volcano.png)
![image info](./results/plots/differential/treatment_mCherry_hND6_sample_number/png/volcano.png)


![image info](./results/plots/exploratory/treatment/png/boxplot.png)

![image info](./results/plots/qc/treatment_mCherry_hND6_.deseq2.dispersion.png)
![image info](./results/plots/qc/treatment_mCherry_hND6_sample_number.deseq2.dispersion.png)