Skip to content

Latest commit

 

History

History
50 lines (37 loc) · 3.56 KB

README.md

File metadata and controls

50 lines (37 loc) · 3.56 KB

Release Anaconda-Server Badge Anaconda-Server Badge Conda

Post-Assembly Variant Finder (PAVFinder)

PAVFinder is a Python package that detects structural variants from de novo assemblies (e.g. RNA-Bloom, ABySS, Trans-ABySS). As such, it is able to analyse both genome and transcriptome assemblies:

genomic structural variants pavfinder genome

  • translocations
  • inversions
  • duplications
  • insertions
  • deletions
  • simple-repeat expansions/contractions

transcriptomic structural variants pavfinder fusion

  • gene fusions
  • internal tandem duplications (ITD)
  • partial tandem duplications (PTD)
  • small indels
  • simple-repeat expansions/contractions

transcriptomic splice variants pavfinder splice

  • skipped exons
  • novel exons
  • novel introns
  • retained introns
  • novel splice acceptors/donors

PAVFinder infers variants from non-contiguous (split or gapped) contig sequence alignments to the reference genome. Assemblies can be aligned to the reference genome (c2g alignment) using bwa mem(genome) or gmap(transcriptome). Read support for events can be gathered by aligning reads to the assembly using bwa mem (r2c alignment).

We provide a Targeted-Assembly-Pipeline, TAP, to facilitate transcriptome analysis on selected genes. This requires a multi-index Bloom Filter of targeted gene sequences to be created beforehand. Whereas whole transcriptome analysis with over 100 million read pairs can take more than 24 hours, a targeted analysis of several hundred genes (e.g. COSMIC) can be completed within half an hour. TAP uses Trans-ABySS for transcriptome assembly. TAP2 is the successor of TAP and it uses RNA-Bloom for transcriptome assembly.

We also provide a pipeline for gene fusion detection in RNA-seq data, Fusion-Bloom, which couples PAVFinder with RNA-Bloom. We demonstrated that it has higher senstivitiy and specificity than most state-of-the-art fusion callers.

Installation

See INSTALL.md

Usage

See USAGE.md

Test Data

See pavfinder/test for a small dataset to test our transcriptome (TAP, TAP2, and Fusion-Bloom) and genome workflows.

Publications

Readman Chiu, Ka Ming Nip, Justin Chu and Inanc Birol. TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data. BMC Med Genomics (2018) 11:79 https://doi.org/10.1186/s12920-018-0402-6

Readman Chiu, Ka Ming Nip, Inanc Birol. Fusion-Bloom: fusion detection in assembled transcriptomes. Bioinformatics (2019) btz902 https://doi.org/10.1093/bioinformatics/btz902