Physlr: Next-generation Physical Maps
physical-map constructs a de novo physical map using linked reads from 10X Genomics or MGI stLFR. This physical map can then be used for various genomics analyses, including scaffolding. Physlr
scaffolds uses the physical map generated in the first stage to scaffold an existing genome assembly to yield chromosome-level contiguity.
- Running Physlr
You can install Physlr either via Conda or by compiling from source. We recommend installing Physlr via Conda package manager (Linux, MacOS), which will handle compilation and dependencies automatically.
Install Physlr using Conda
In an active
conda install -c bioconda physlr physlr help
Physlr can generate complmentary reports (included in the pipeline by default) - you can install dependencies for these optional features using conda:
conda install -c r r-rmarkdown conda install -c conda-forge r-ggplot2
We recommend using
pypy3 over regular python3 for speed.
pypy v3 or
pypy3 is the default python executable for Physlr. To switch to other executables set the
physlr [OPTION]... python_executable=python3
You can install pypy3 using conda:
conda install -c conda-forge pypy3.8 # Change specified version based on your conda environment's python version (3.6 to 3.9 are supported)
Compile Physlr from source
Compile Physlr using the following commands:
pip3 install --user git+https://github.com/bcgsc/physlr git clone https://github.com/bcgsc/physlr cd physlr/src && make install
or, to install Physlr in a specified directory (like
pip3 install --user git+https://github.com/bcgsc/physlr git clone https://github.com/bcgsc/physlr cd physlr/src && make install PREFIX=/opt/physlr
after compiling, Physlr commands will be available through:
bin/physlr-make bin/physlr-make help
- GCC 5 or newer with OpenMP and boost
- Python 3.5 or newer and the following packages
Generate a physical map
To construct a physical map de novo, you need linked reads (from 10X Genomics or MGI stLFR).
In this example, the linked reads dataset is called
linkedreads.fq.gz. The linked reads are from stLFR so we specify
protocol=stlfr to use the default value for stLFR reads.
cd experiment # Change to working directory physlr physical-map lr=linkedreads protocol=stlfr # Constructs the physical map
You also have the option to provide a reference genome (with
ref) for Physlr to evaluate the physical map. Assuming the reference is called
reference.fa, you can run the following command for the previous example:
cd experiment physlr physical-map lr=linkedreads ref=reference protocol=stlfr # Constructs the physical map and reference-based evaluations for it
If you provide a reference genome, Physlr first constructs a physical map and then maps it to the input reference. In this case, Physlr automatically outputs a
*.map-quality.tsv file reporting assembly-like quality metrics for the physical map. In addition, Physlr visualizes the correctness and contiguity of the physical map.
You can also independently run the physical map construction and evaluation steps:
cd experiment physlr physical-map lr=linkedreads protocol=stlfr physlr map-quality lr=linkedreads ref=reference
Scaffold an assembly
To scaffold a draft assembly, you need linked reads from 10X Genomics or stLFR, and an existing assembly.
In this example, the linked reads and draft assembly are called
draft.fa, respectively. The linked reads are from 10X Genomics so we specify
protocol=10x to use the default value for 10X Genomics reads.
cd experiment bin/physlr-make scaffolds lr=linkedreads draft=draft protocol=10x
You can also include a reference genome ('reference.fa' in this example) for Physlr to calculate Quast summary metrics for the Physlr scaffolded assembly:
cd experiment bin/physlr-make scaffolds lr=linkedreads ref=reference draft=draft protocol=10x
See the help page for further information.
lr.physlr.physical-map.path: Paths of barcodes (backbones).
lr.physlr.physical-map.ref.n10.paf.gz.*.pdf: Various graphs showing the contiguity and correctness of the backbones with respect to the reference.
draft.physlr.fa: Physlr scaffolded assembly using the physical map.
draft.physlr.quast.tsv: Quast metrics comparing the Physlr scaffolded assembly against the reference.
If you use Physlr in your research, please cite:
Afshinfard A, Jackman SD, Wong J, Coombe L, Chu J, Nikolic V, Dilek G, Malkoç Y, Warren RL, Birol I. Physlr: Next-Generation Physical Maps. DNA. 2022 Jun 10;2(2):116-30. doi: https://doi.org/10.3390/dna2020009
This projects uses:
- btl_bloomfilter BTL C/C++ Common bloom filters for bioinformatics projects implemented by Justin Chu
- nthash rolling hash implementation by Hamid Mohamadi
- readfq Fast multi-line FASTA/Q reader API implemented by Heng Li
- robin-map C++ implementation of a fast hash map and hash set using robin hood hashing by Thibaut G.