Skip to content
Medhat edited this page May 11, 2022 · 2 revisions

Long-read sequencing has been shown to have advantages in structural variation (SV) detection and methylation calling. Many studies focus either on SV, methylation, or phasing of SNV; however, only the combination of variants provides a comprehensive insight into the sample and thus enables novel findings in biology or medicine. PRINCESS is a structured workflow that takes raw sequence reads and generates a fully phased SNV, SV, and methylation call set within a few hours. PRINCESS achieves high accuracy and long phasing even on low coverage datasets and can resolve repetitive, complex medical relevant genes that often escape detection. PRINCESS is publicly available at https://github.com/MeHelmy/princess under the MIT license.

PRINCESS itself is not only a workflow of existing tools, but rather includes multiple optimizations, QC approaches, and novel methods. Besides parameter optimizations, PRINCESS extends the principle of phasing variants to structural variations and also includes modules to phase methylation data (not shown here). This makes PRINCESS unique, as no other tool currently offers this level of comprehensiveness. Phasing SV, however, remains challenging as SV also often leads to problems for SNV calling and thus SNV phasing due to alignment artifacts or simple assumption violations, e.g., of heterozygote vs. homozygote ratios of SNV inside a duplication. PRINCESS has by default conservative settings that do not phase a SV if one of the reads is showing a conflict. This leads to a lower phasing ability for HS1011. However, users can define a threshold to allow one or more reads to be in conflict to enable a higher phasing rate of SV itself. This in our experiments does not lead to a significantly higher Hamming error rate. In addition, PRINCESS includes code to enable the haplotype assessment of the methylation calls, which provides a comprehensive foundation for maximal analysis of a given sample.

Clone this wiki locally