# Computational Workflows for biomedical data

Welcome to the course Computational Workflows for Biomedical Data. Over the next two weeks, you will learn how to leverage nf-core pipelines to analyze biomedical data and gain hands-on experience in creating your own pipelines, with a strong emphasis on Nextflow and nf-core.

Course Structure:

- Week 1: You will use a variety of nf-core pipelines to analyze a publicly available biomedical study.
- Week 2: We will shift focus to learning the basics of Nextflow, enabling you to design and implement your own computational workflows.<br>
- Final Project: The last couple of days, you will apply your knowledge to create a custom pipeline for analyzing biomedical data using Nextflow and the nf-core template.

## Basics

If you have not installed all required software, please do so now asap!


If you already installed all software, please go on and start answering the questions in this notebook. If you have any questions, don't hesitate to approach us.

1. What is nf-core?

A community-based platform that hosts a set of curated Nextflow pipelines for various types of analyses.

2. How many pipelines are there currently in nf-core?

112 pipelines are currently available (66 are released). 

3. Are there any non-bioinformatic pipelines in nf-core?

Yes, there are also pipelines for economics and astronomy.

4. Let's go back a couple of steps. What is a pipeline and what do we use it for?

It is an automated process consisting of multiple organized steps, where the output from one step is the input to the next. We use them to automate a process and allow multiple people to use the same workflow and obtain the same results.

5. Why do you think nf-core adheres to strict guidelines?

To ensure reproducibility and that the same pipeline always yields the same result, regardless of OS, user, etc. It also ensures that the pipelines work correctly and without errors. The comprehensive documentations helps users run to easily run the pipeline with the correct inputs/commands.

6. What are the main features of nf-core pipelines?

Processes, channels connecting the processes, and (sub)workflows of one or more processes make up the pipelines.
Pipelines have clear documentation, softwares are packaged, and have stable releases to ensure reproducibility.

## Let's start using the pipelines

1. Find the nf-core pipeline used to measure differential abundance of genes:

In [1]:
# run the pipeline in a cell 
# to run bash in jupyter notebooks, simply use ! before the command
# e.g.

!pwd


# For the tasks in the first week, please use the command line to run your commands and simply paste the commands you used in the respective cells!


/Users/Jessie/PycharmProjects/comp_workflows/day1


In [2]:
# run the pipeline in the test profile using docker containers
# make sure to specify the version you want to use (use the latest one)


!nextflow run nf-core/differentialabundance -profile test,docker --outdir work/results -c config_cou.config -r 1.5.0



[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 24.04.4[m
[K
Launching[35m `https://github.com/nf-core/differentialabundance` [0;2m[[0;1;36mscruffy_banach[0;2m] DSL2 - [36mrevision: [0;36m3dd360fed0 [1.5.0][m
[K
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32m1.5

In [1]:
# repeat the run. What did change?

The run started from scratch again and new subdirectories were created in the work folder for each process. The run took a similar amount of time to finish, since all processes had to be re-run from scratch. The results themselves were identical to the first run. New subdirectories were created in the work directory.

In [4]:
# now set -resume to the command. What did change?
!nextflow run nf-core/differentialabundance -profile test,docker --outdir work/results -c config_cou.config -r 1.5.0 -resume


[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 24.04.4[m
[K
Launching[35m `https://github.com/nf-core/differentialabundance` [0;2m[[0;1;36minfallible_wing[0;2m] DSL2 - [36mrevision: [0;36m3dd360fed0 [1.5.0][m
[K
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/differentialabundance v1.5.0-g3dd360f[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mrevision                    : [0;32m1.

Cached files from the previous run were used. Since each individual process did not need to be re-run, the pipeline ran much faster overall (1min 26s) and no new subdirectories were created in the work folder. The results remain identical.

Check out the current directory. Next to the outdir you specified, what else has changed?

From the first and second run, we have a lot of subdirectories containing the outputs from individual processes. There are also 2 stage directories. A log file for each run is also created. In the tmp directory there are samplesheets for the samples.

In [2]:
# delete the work directory and run the pipeline again using -resume. What did change?


What changed?
Since there are no cached files this time, the run starts from scratch and performs each process again, producing a new work directory.

## Lets look at the results

### What is differential abundance analysis?

It compares observations in data matrices to generate differential statistics, for example when comparing differential expression between samples under different treatment conditions. It reveals if there is a statistically significant difference in gene expression between samples.

Give the most important plots from the report:

![title](./work/results/plots/differential/treatment_mCherry_hND6_/png/volcano.png)

![title](./work/results/plots/exploratory/treatment/png/density.png)

![title](./work/results/plots/exploratory/treatment/png/boxplot.png)

![title](./work/results/plots/exploratory/treatment/png/pca3d.png)

![title](./work/results/plots/exploratory/treatment/png/pca2d.png)