Skip to content

Tool for copying read-alignments regions centered around variants (in VCF/VCF.GZ/BCF format) from existing genomes (in SAM/BAM/CRAM format) to insert them into canvas genomes (in SAM/BAM/CRAM format).


Notifications You must be signed in to change notification settings


Repository files navigation



R. Martín et al., “ONCOLINER: A new solution for monitoring, improving, and harmonizing somatic variant calling across genomic oncology centers,” Cell Genomics, vol. 4, no. 9. Elsevier BV, p. 100639, Sep. 2024. doi: 10.1016/j.xgen.2024.100639

GenomeMosaicMaker is a tool for for copying read-alignments regions centered around variants (in VCF/VCF.GZ/BCF format) from existing genomes (in SAM/BAM/CRAM format) to insert them into canvas genomes (in SAM/BAM/CRAM format) to generate mosaic genomes.

Existing validated datasets tend to be large and require computationally intensive processing due to the inclusion of numerous samples, even when they contain few validated variants. Mosaic genomes significantly reduce benchmarking efforts, while retaining the intrinsic noise and read sequencing representation of the original samples, and therefore also of their assessment properties. The tool is written in Python.

GenomeMosaicMaker is framed under EUCANCan’s (EUropean-CANadian Cancer network) second work package and is used to complement the benchmarking datasets of ONCOLINER. The generation workflow is provided as a standalone Python script with a command-line interface and is optimized for running in a multi-core environment, more precisely in a single node of MareNostrum 4.


Table of contents



We recommend using singularity-ce with a version higher than 3.9.0. You can download the Singularity container using the following command (does not require root privileges):

singularity pull genome-mosaic-maker.sif docker://

If you want to build the container yourself, you can use the singularity.def file (requires root privileges):

sudo singularity build --force genome-mosaic-maker.sif singularity.def


You can download the Docker image using the following command:

docker pull

You can build the Docker container with the following command (requires root privileges):

docker build -t genome-mosaic-maker .


GenomeMosaicMaker is a command-line tool. Assuming you have a singularity image called genome-mosaic-maker.sif, the process for building a mosaic genome (with a normal and tumor sample) is as follows:

singularity exec genome-mosaic-maker.sif python3 -O /genome-mosaic-maker/src/combinator/ -mp 32 -p 1000 -i input.vcf -if canvas_N.cram canvas_T.cram -o example_out_N.bam example_out_T.bam

The input VCF/BCF/VCF.GZ file must have a field named FILES in the INFO column. This field must contain the paths to the BAM/CRAM/SAM files that will be used to extract the reads. The paths must be separated by a comma (,). The number of files must match the number of input canvas files and the order must be the same. The BAM/CRAM/SAM files must be indexed. Check the example folder for an example of how to prepare the input VCF.


usage: [-h] --input INPUT --outputs OUTPUTS [OUTPUTS ...]
               [--canvas-files CANVAS_FILES [CANVAS_FILES ...]]
               [--padding PADDING] [--maximum-processes MAXIMUM_PROCESSES]
               [--fasta-ref FASTA_REF] [--split-read-groups]

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT, -i INPUT
                        Input VCF file
  --outputs OUTPUTS [OUTPUTS ...], -o OUTPUTS [OUTPUTS ...]
                        Output alignment files
                        canvas alignment files
  --padding PADDING, -p PADDING
                        Padding around the variants
                        Maximum number of physical processes to use
  --fasta-ref FASTA_REF, -f FASTA_REF
                        Fasta reference file (used for CRAM files)
  --split-read-groups   Keep the read groups separate (they will be


  • Rodrigo Martín - Code and Scientific Methodology - ORCID GitHub
  • David Torrents - Scientific Methodology - ORCID


This project is licensed under the BSC Dual License - see the LICENSE file for details.


Tool for copying read-alignments regions centered around variants (in VCF/VCF.GZ/BCF format) from existing genomes (in SAM/BAM/CRAM format) to insert them into canvas genomes (in SAM/BAM/CRAM format).






