Post Assembly Variants Finder
Python Shell
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
pavfinder
CHANGES.md
INSTALL.md
MANIFEST.in
README.md
USAGE.md
requirements.txt
setup.py

README.md

Post-Assembly Variant Finder (PAVFinder)

PAVFinder is a Python package that detects structural variants from de novo assemblies (e.g. ABySS, Trans-ABySS). As such, it is able to analyse both genome and transcriptome assemblies:

genomic structural variants pavfinder genome

  • translocations
  • inversions
  • duplications
  • insertions
  • deletions
  • simple-repeat expansions/contractions

transcriptomic structural variants pavfinder fusion

  • gene fusions
  • internal tandem duplications (ITD)
  • partial tandem duplications (PTD)
  • small indels
  • simple-repeat expansions/contractions

transcriptomic splice variants pavfinder splice

  • skipped exons
  • novel exons
  • novel introns
  • retained introns
  • novel splice acceptors/donors

PAVFinder infers variants from non-contiguous (split or gapped) contig sequence alignments to the reference genome. Assemblies can be aligned to the reference genome (c2g alignment) using bwa mem(genome) or gmap(transcriptome). Read support for events can be gathered by aligning reads to the assembly using bwa mem (r2c alignment).

A pipeline that bundles the 3 analysis steps called TAP (Transabyss-Alignment-PAVFinder) is provided to facilitate whole transcriptome analysis. TAP is also designed to be run in a targeted mode on selected genes. This requires a Bloom Filter of targeted gene sequences to be created beforehand. Whereas the full assembly of a single RNAseq library with over 100 million read pairs requires more than 24 hours to complete, a targeted assembly and analysis of a gene list (e.g. COSMIC) of several hundred can be completed within half an hour.