# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

- Nf-core is a community-driven tool that provides analysis pipelines built with Nextflow, which is a workflow management tool mainly used in the field of bioinformatics. Nf-core has a wide field of users: Facilities, developers and single users. 

2. How many pipelines are there currently in nf-core?

- Currently, there are 112 pipelines available as part of nf-core.

3. Are there any non-bioinformatic pipelines in nf-core?

- When you look through the catalogue of pipelines on the nf-core website, you can mainly see bioinformatics pipelines. But there is one economic and one astrology pipeline. 

Examples: 
- meerpipe:Astronomy pipeline that processes MeerKAT pulsar data to produce images and data products for pulsar timing analysis
- rangeland: Pipeline for remotely sensed imagery

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

- A pipeline is a workflow of data processing steps connected in a serie. So the output of one step is the input for the next step and so on. The main goal is to automate common series of steps.

5. Why do you think nf-core adheres to strict guidelines?

- Because nf-core wants the pipelines to work well for a range of different users. Therefore, it includes well-structured pipelines and extensive documentation for each pipeline including installation and usage guides and description of the output files.

- It is easier to follow if we have the same structure in all pipelines.

6. What are the main features of nf-core pipelines?

- First of all, the each pipeline should be reusable. As I said, each pipeline has its own strict documentation to allow this reusability and standardization. Also, it is built with Nextflow. Furthermore, nf-core pipelines have stable releases and are open-source. The pipelines can be run anywhere and the dependencies are automatically installed using Docker. We have packages, so it is easy to install and we get the same results, even when running it from different laptops.

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes

It is called nf-core/differentialabundance.

In [1]:
# run the pipeline in a cell 
# we need docker to run the pipeline, else we would need to install all packages from the pipeline in our environment


# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd

# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!


/home/tabea/ComputationalWorkflows/Tag1


In [2]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir /home/tabea/ComputationalWorkflows/Tag1/test
# this command was run in ubuntu because of an error: Command error:
# docker: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Head "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied.
# There it run sucessfully. I saved the output in the "test" folder.

# Duration: 5 min 14 sec
# 21 succeeded


[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/differentialabundance` [cheeky_carlsson] DSL2 - revision: 3dd360fed0 [1.5.0]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32m1.5.0[0m
  [0;34mrunNa

In [3]:
!docker --version

Docker version 27.2.0, build 3ab4256


In [4]:
# repeat the run. What did change?
!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker --outdir /home/tabea/ComputationalWorkflows/Tag1/test2

# it run through much faster, the first run: 5 min 14 sec, the second run: 1 min 29 sec
# 21 succeeded, which is equal to the first run

[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/differentialabundance` [reverent_lalande] DSL2 - revision: 3dd360fed0 [1.5.0]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32m1.5.0[0m
  [0;34mrunN

In [5]:
# now set -resume to the command. What did change?
!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker -resume --outdir /home/tabea/ComputationalWorkflows/Tag1/test3

# The run was even faster than the other ones (17.3s)
# But only 3 succeeded and 18 cached


[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/differentialabundance` [cranky_lorenz] DSL2 - revision: 3dd360fed0 [1.5.0]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32m1.5.0[0m
  [0;34mrunName

Check out the current directory. Next to the outdir you specified, what else has changed?

In [7]:
# delete the work directory and run the pipeline again using -resume. What did change?

!nextflow run nf-core/differentialabundance -r 1.5.0 -profile test,docker -resume --outdir /home/tabea/ComputationalWorkflows/Tag1/test4


[33mNextflow 24.04.4 is available - Please consider updating your version to it[m
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/differentialabundance` [distraught_brattain] DSL2 - revision: 3dd360fed0 [1.5.0]
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32m1.5.0[0m
  [0;34mr

What changed?
- now, the run took 1 min 34 sec, equal to test2. 
- Also, we get 21 succeeded

## Lets look at the results

### What is differential abundance analysis?

Differential abundance analysis (DAA) is a statistical data analysis method which calculates the abundance of specific taxa in the context of microbiome data. There are different DAA tools available that sometimes can produce different output.

(Quellen: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-022-01320-0, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02655-5)

Give the most important plots from the report:

- the plots look equal in the different runs

![alt text](test/plots/exploratory/treatment/png/boxplot.png "Title")

![alt text](test/plots/exploratory/treatment/png/density.png "Title")

![alt text](test/plots/qc/treatment_mCherry_hND6_sample_number.deseq2.dispersion.png "Title")

![alt text](test/plots/exploratory/treatment/png/sample_dendrogram.png "Title")