Skip to content

apetkau/nf-core-assemblyexample

Repository files navigation

Example nf-core

This repository shows off some basic steps for creating a pipeline in Nextflow/nf-core. This is supplementary material for a presentation, which are listed below:

Additional information about creating a pipeline can also be found in the nf-core documentation: https://nf-co.re/docs/contributing/adding_pipelines.

The files in this repository are part of the default nf-core template created initially using the command:

nf-core create --name assemblyexample --description "Example assembly pipeline" --plain --author "Aaron Petkau"

The following steps proceed through the process of adapting this template to execute the pipeline defined in https://github.com/apetkau/assembly-nf/.

The readme file created by nf-core create is README.md.ORIG.

Setup

Prior to proceeding through this information, please make sure that nextflow and nf-core is installed. This can be installed with conda using:

conda create --name nextflow nextflow nf-core
conda activate nextflow

Step 1. Running initial template pipeline

# Checkout necessary files
git checkout step1
cd example-execution

# Run pipeline
nextflow run ../ --input samplesheet.csv --outdir results -profile singularity --genome hg38 --max_memory 8.GB --max_cpus 4

You can ignore --max_memory and --max_cpus if you wish to use the defaults (defined in nextflow.config). However, you may need to adjust these values depending on which machine you run the pipeline on. These act to set a cap on the maximum resources used by each process (see https://nf-co.re/docs/usage/configuration#max-resources).

Step 2. Adding processess

To add additional processess to the workflow, we will first start with the three processess (FASTP, MEGAHIT, QUAST) from https://github.com/apetkau/assembly-nf/blob/main/main.nf. These will be broken up into separate files and added to modules/local/. That is, we will add the following files:

We will have to modify these files to replace any val(sample_id) with val(meta) and ${sample_id} with ${meta.id} due to the way nf-core structures data within a channel (for nf-core, meta.id is the sample identifier associated with fastq files).

Next, we modify the file workflows/assemblyexample.nf to import the above modules and add the steps to the workflow.

You can now run the updated workflow with the same run command:

cd example-execution
nextflow run ../ --input samplesheet.csv --outdir results -profile singularity --genome hg38

To view a summary of all changes, please see https://github.com/apetkau/nf-core-assemblyexample/compare/step1...step2 (you can ignore changes in README.md).

Step 3. Switching to nf-core modules

Nf-core provides a large collection of modules that define processess for bioinformatics tools (fastp, megahit, quast). To switch to these community-maintained modules, you can do the following.

3.1. Install nf-core modules

To install the nf-core modules, make sure you are in the root of the nextflow pipeline directory (this directory https://github.com/apetkau/nf-core-assemblyexample) and run the following:

nf-core modules install fastp
nf-core modules install megahit
nf-core modules install quast

This will install the modules in modules/nf-core and create a file modules.json to track versions. You can commit these files to git.

3.2. Add modules to workflow

To add modules to the workflow (workflows/assemblyexample.nf), for each module add the following line to import the module:

include { FASTP                       } from '../modules/nf-core/fastp/main'

Next, modify the execution of each imported process if there were different parameters or input/output files.

3.3. Make adjustments to max memory

You can make adjustments to many of the parameters of the pipeline in nextflow.config. These can be overridden by command-line arguments (such as --max_memory as described above), but it may be useful to adjust the default values here. In particular, the megahit tool is set to use a very large amount of memory by default. You can decrease the default maximum memory by setting max_memory = 8.GB in this file (adjusting the value for your particular use case).

3.4. Remove existing modules

You can now remove the previously created modules/local/{fastp,megahit,quast}.nf files, as they are no longer needed.

3.5. Execute pipeline

You should now be able to execute the pipeline:

cd example-execution
nextflow run ../ --input samplesheet.csv --outdir results -profile singularity --genome hg38

To view a summary of all changes, please see https://github.com/apetkau/nf-core-assemblyexample/compare/step2...step3 (you can ignore changes in README.md).

Step 4. Adjusting parameters

Parameters can be adjusted in the nextflow.config file. These can be set to defaults, or new parameters added/others removed.

To get rid of the need to use --genome hg38, an easy way is to set genome = 'hg38' as a default genome parameter.

However, to get rid of the parameter entirely, you can delete it from nextflow.config and comment-out the following lines

if (!params.fasta) {
Nextflow.error "Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file."
}
.

For this step, I have chosen to set "hg38" as the default, even if it's not used. To view a summary of changes, please see https://github.com/apetkau/nf-core-assemblyexample/compare/step3...step4 (you can ignore changes in README.md).

Step 5. Tests and linting

5.1. Linting

nf-core provides the capability to run a linter to check for any possible issues using the command nf-core lint (see the nf-core linting documentation for more details). Running this now gives:

Command

nf-core lint

Output

...
│ pipeline_todos: TOD string in nextflow.config: Specify your pipeline's command line flags                                                                       │
│ pipeline_todos: TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)      │
│ pipeline_todos: TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
...

╭───────────────────────╮
│ LINT RESULTS SUMMARY  │
├───────────────────────┤
│ [✔] 193 Tests Passed  │
│ [?]   0 Tests Ignored │
│ [!]  25 Test Warnings │
│ [✗]   0 Tests Failed  │
╰───────────────────────╯

That is, there are no issues with this pipeline, though there are a number of warnings, which all have to do with addressing TODO statements. We will focus on addressing the Testing TODOs.

5.2. Testing

nf-core also provides profiles that are intended to be used to run the pipeline with test data (see the nf-core pipeline testing tutorial for details). To do this, we can run the below command:

Command

nextflow run . -profile docker,test --outdir results

Output

executor >  local (15)
[7f/08ba89] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_test_illumina_amplicon.csv) [100%] 1 of 1 ✔
[3c/da189b] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:FASTQC (SAMPLE1_PE_T1)                                                 [100%] 4 of 4 ✔
[c2/3aea54] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:FASTP (SAMPLE1_PE_T1)                                                  [100%] 4 of 4 ✔
[0b/4a2548] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:MEGAHIT (SAMPLE1_PE_T1)                                                [100%] 4 of 4 ✔
[-        ] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:QUAST                                                                  -
[47/11da11] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:CUSTOM_DUMPSOFTWAREVERSIONS (1)                                        [100%] 1 of 1 ✔
[21/266316] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:MULTIQC                                                                [100%] 1 of 1 ✔
-[nf-core/assemblyexample] Pipeline completed successfully-
Completed at: 15-Aug-2023 15:54:16
Duration    : 1m 47s
CPU hours   : 0.1
Succeeded   : 15

This runs the pipeline with a minimal dataset and configured parameters as defined in https://github.com/apetkau/nf-core-assemblyexample/blob/step5/conf/test.config.

params {
    config_profile_name        = 'Test profile'
    config_profile_description = 'Minimal test dataset to check pipeline function'

    // Limit resources so that this can run on GitHub Actions
    max_cpus   = 2
    max_memory = '6.GB'
    max_time   = '6.h'

    // Input data
    // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
    // TODO nf-core: Give any required params for the test so that command line flags are not needed
    input  = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'

    // Genome references
    genome = 'R64-1-1'
}

You may need to update this if the pipeline tests fail to run.

There are two types of testing profiles: (1) test (for small-scale testing) and (2) test_full (for full-sized dataset testing).

Let's try to run test_full.

Command

nextflow run . -profile docker,test_full --outdir results

Output

[11/56325e] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_full_illumina_amplicon.csv) [100%] 1 of 1 ✔
[b5/a1ea7d] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:FASTQC (sample2_T1)                                                    [100%] 2 of 2 ✔
[3a/57842f] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:FASTP (sample2_T1)                                                     [100%] 2 of 2 ✔
[fd/af2df1] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:MEGAHIT (sample2_T1)                                                   [100%] 2 of 2 ✔
[-        ] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:QUAST                                                                  -
[9b/b078b7] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:CUSTOM_DUMPSOFTWAREVERSIONS (1)                                        [100%] 1 of 1 ✔
[be/20ded5] process > NFCORE_ASSEMBLYEXAMPLE:ASSEMBLYEXAMPLE:MULTIQC                                                                [100%] 1 of 1 ✔
-[nf-core/assemblyexample] Pipeline completed successfully-
Completed at: 15-Aug-2023 16:19:25
Duration    : 16m 46s
CPU hours   : 1.0
Succeeded   : 9

This succeeded as well, but was on larger files (compare running time of ~2 min for test to ~17 min for test_full.

Since both the test and test_full profiles succeded with the defaults provided by nf-core, the only changes needed are to remove the TODO statements in the respective files: test and test_full.

To view a summary of changes, please see https://github.com/apetkau/nf-core-assemblyexample/compare/step4...step5 (you can ignore changes in README.md).

Step 6: CI with GitHub Actions

Continous Integration (CI) is the process of frequently commiting/merging code into a shared repository and automating the execution of tests to provide rapid feedback to catch any possible issues to new code (see https://www.atlassian.com/continuous-delivery/continuous-integration). GitHub Actions is one way to exectue continues integration suites that is provided by GitHub. nf-core has a comprehensive set of GitHub Actions workflows to run linting and pipeline tests on code (see nf-core testing for details).

In this step, we will make the necessary changes in order to get the nf-core CI workflows configured on GitHub.

6.1. Create branch and pull-request

The first step is to create a separate branch for these code changes and a pull-request. This can be done with:

git checkout -b step/ci-tests
# Make changes to some files here and commit
git push origin step/ci-tests

In GitHub, we will create a step 6 pull request with these changes. This will trigger the existing configured GitHub Actions by nf-core. This PR should be to the branch dev.

6.2. Fix existing tests

On GitHub Actions CI tests for the current code, there is one faillure for the nf-core linting / Prettier check, mainly:

Run prettier --check ${GITHUB_WORKSPACE}
Checking formatting...
[warn] modules.json
[warn] README.md
[warn] Code style issues found in 2 files. Run Prettier to fix.
Error: Process completed with exit code 1.

The command prettier is used to check for consistent formatting of code, and is described in the nf-core code formatting documentation.

The above error messages indicate some files are failing the prettier check. To run prettier manually to reproduce the CI test issues, we can use the following commands:

Command

conda install prettier
prettier --check .

Output

Checking formatting...
[warn] modules.json
[warn] README.md
[warn] Code style issues found in 2 files. Run Prettier to fix.

To make the necessary changes, the following can be run:

prettier -w .

Now, you can commit any of the changed files and re-push to try the CI tests again. All tests should pass.

nf-core-ci-tests-pass.png

These tests are divided up into two categories (see nf-core testing docs for details).

  • Lint tests: These tests verify that code confirms to nf-core specs and includes running nf-core lint as well as prettier (code formatting check) as well as EditorConfig checker and Python Black (other code formatting checks).
  • Pipeline tests: These tests run the pipeline by running the command nextflow run . -profile test,docker --outdir ./results (i.e., running the pipeline using the test profile and data configured in Step 5).

Configuration for the GitHub Actions workflows can be found in the .github/workflows directory.

6.3. Fix up CI-related TODOs/other small fixes

We will also fix up and CI-related TODOs. These can be reviewed by running nf-core lint, which should show:

│ pipeline_todos: TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required                                                            │
│ pipeline_todos: TODO string in ci.yml: You can customise CI pipeline run tests as required

6.3.1. AWS testing

The first TODO relates to .github/workflows/awsfulltest.yml, which runs a test on the full dataset (the test_full profile) when the pipeline is released. We are not going to use AWS right now, so this TODO can be ignored.

6.3.2. ci.yml customization

This TODO is related to .github/workflows/ci.yml, and describes how the test command can be modified here. We can remove this TODO statement as we do not need to modify the command.

6.3.3. Automatic comments on PR

The .github/workflows/lint_comment.yml GitHub Actions workflow will add comments following linting of a PR on any detected issues. In order for this to work properly, you will need to go to Settings > Actions > General > Workflow permissions in the GitHub repository, and make sure to enable read and write permissions (as described here https://github.com/marocchino/sticky-pull-request-comment).

6.3.4. All changes

All these changes can be viewed in the following URL: https://github.com/apetkau/nf-core-assemblyexample/compare/1f2ebe6a4289dae51b3496f1358cd210a5c255e7...43c894209ba37e4d5755e6ea425fa0efa5ddbc80.

6.4. Finish PR

Finally, the necessary changes have been made. You can push to the branch step/ci-tests and verify the tests pass.

The passing tests and code changes can all be reviewed in the step 6 pull-request. You can also review the code changes needed at https://github.com/apetkau/nf-core-assemblyexample/compare/step5...step6.

Step 7: Updating modules

nf-core distributed modules may need to be updated over time with the latest versions. This can be done with the command:

nf-core modules update

This command will ask a number of questions on how you wish to proceed with updating and will then update modules in modules/nf-core/.

Once everything is finished you can do git diff to view the differences. If you are happy, you can commit and create a PR (which should run the tests).

You can review the code changes performed at https://github.com/apetkau/nf-core-assemblyexample/compare/step6...step7.

Step 8: nf-core sync

On updates of the nf-core software, you may need to resynchronize the code with nf-core, in particular resynchronize and merge in the TEMPLATE branch. This is described in more detail at https://nf-co.re/docs/contributing/sync. This step goes through the process of synchronizing with nf-core between versions 2.9 and 2.10.

8.1. Check and update nf-core software

To see what the latest version of the nf-core tools are, you can look through the GitHub repo at https://github.com/nf-core/tools. The latest version (at the time of writing) is 2.10. You can check what version of your nf-core tools is by running:

nf-core --version

If your version is the latest, then there should be no need to sync with nf-core, so you can skip Step 8. However, if your version is behind, then you may need to synchronize with nf-core.

If it is needed, to update to the latest version of nf-core you can run (assuming nf-core was installed via conda):

conda update nf-core

8.2. nf-core sync

Once nf-core is updated, to synchronize you can run:

nf-core sync

This should update the TEMPLATE branch on your git repository to the latest version. Please see the nf-core sync documentation for more details.

8.3. Merge TEMPLATE branch

Once you have synchronized, you can use the following command to merge the TEMPLATE branch into your current code.

git merge TEMPLATE

This may lead to merge conflicts, which you will have to fix manually. You can see which files need fixing in the case of merge conflicts by running:

git status

Output

Unmerged paths:
  (use "git add <file>..." to mark resolution)
        both modified:   README.md
        both modified:   modules.json
        both modified:   modules/nf-core/custom/dumpsoftwareversions/main.nf
        both modified:   modules/nf-core/fastqc/main.nf
        both modified:   modules/nf-core/multiqc/main.nf

In this case, I want to keep my own copy of README.md without merging. To do this, you can use the following command:

git checkout --ours README.md
git add README.md

Here, --ours means the version of README.md prior to merging TEMPLATE (you can use --theirs to keep the version found in the TEMPLATE branch).

I am also going to keep my own copies of all the nf-core modules since I will update them afterwards. To keep all files, you can run:

git checkout --ours .
git add README.md modules.json modules/nf-core/custom/dumpsoftwareversions/main.nf modules/nf-core/fastqc/main.nf modules/nf-core/multiqc/main.nf

Next, to finish the merge of TEMPLATE, you will have to make a commit. Please review the output of git status first to make sure you are commiting the correct files.

git commit -m "Merged TEMPLATE"

8.4. (Optional) update modules

For this example code, I will also update modules (as described in Step 7: Updating modules.

nf-core modules update

Once you've updated, please make sure to test any of the code changes, make any fixes, and commit.

nextflow run . -profile docker,test --outdir results

You can review the code changes performed at https://github.com/apetkau/nf-core-assemblyexample/compare/step7...step8.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published