Nextflow pipeline to detect matched BAMs with NGSCheckMate


Nextflow pipeline to detect matched BAMs with NGSCheckMate.

Workflow representation


Implementation of NGSCheckMate and its underlying subset calling, distibuted per sample.


  1. Nextflow : for common installation procedures see the IARC-nf repository.
  2. NGSCheckMate (follow instructions, especially setting up $NCM_HOME variable)
  3. samtools
  4. bcftools

Additionally, the graph output option requires R; see details below about this option.


--input your input BAM file(s) (do not forget the quotes e.g. --input "test_*.bam"). Warning : your BAM file(s) must be indexed, and the test_*.bai should be in the same folder.
--input_folder Folder with BAM files
--input_file Input file (comma-separated) with 3 columns: ID (individual ID), suffix (suffix for sample names; e.g. RNA), and bam (path to bam file).

A nextflow.config is also included, please modify it for suitability outside our pre-configured clusters (see Nexflow configuration).

Note that the input_file format is tab-delimited text file; this file is used both to provide input bam file locations but also for the generation of the graphs. The ID field must be unique to a subject (e.g. both tumor and normal samples from the same individual must have the same individual identifier). The bam field must be unique to a file name. For example, the following is a valid file:

ID suffix bam NA06984 _RNA NA06984_T_transcriptome.bam
NA06984 _WGS NA06984_T_genome.bam


  • Mandatory

--output_folder results the folder that will contain NGSCheckMate folder with all results in text files.
--ref ref.fasta your reference in FASTA
--bed SNP_GRCh38.bed Panel of SNP bed file from NGSCheckMate

Note that a bed file SNP_GRCh38.bed is provided, which is a liftOver of the files at To use other references, you can provide your own bedfile.

  • Optional

--mem 16 Memory requested (in GB) for calling and NGSCheckmate run
--cpu 4 Number of threads for germline calling
--bai_ext .bam.bai Extenstion of bai files


nextflow run NGSCheckMate-nf/ -r v1.1 -profile singularity --ref ref.fasta --input_folder BAM/

To run the pipeline without singularity just remove "-profile singularity". Alternatively, one can run the pipeline using a docker container (-profile docker) the conda receipe containing all required dependencies (-profile conda).


vcfs a folder with the vcfs used for the matching
NCM_output/output*.txt NGSCheckmate output files with matches between files (see
NCM_output/output.pdf hierarchical clustering plot from
NCM_output/NCM_graph_wrongmatch.xgmml graph with only the samples without a match (adapted from
NCM_output/NCM_graph.xgmml graph with all samples (adapted from

Note that we recommend Cytoscape to visualize the .xgmml graphs.

Usage for Cobalt cluster

nextflow run iarcbioinfo/NGSCheckMate -profile cobalt --input "/data/test_*.bam" --output_dir /data/cohort_output --ref_fasta /ref/Homo_sapiens_assembly38.fasta --bed /home/user/bin/NGSCheckMate/SNP/SNP_GRCh38.bed


Why are some files not included although the are in the intput_folder?

be careful that if bai files are missing for some bam files, the bam files will be ignored without the workflow returning an error

What modifications have been done to the original NGSCheckMate code?

We provide a modified version of the graph/ngscheckmate2xgmml.R R script from to output graphs in .xgmml format. The modifications allow to represent all samples, even those that match, and improve a small glitch in the color palette.


Nicolas Alcala* Developer to contact for support
Maxime Vallée Developer