No description or website provided.
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
pavfinder_transcriptome
INSTALL.md
MANIFEST.in
README.md
USAGE.md
requirements.txt
setup.py

README.md

PAVFinder_transcriptome

Author: Readman Chiu (rchiu@bcgsc.ca)

PAVFinder_transcriptome (PVT) is a Python package written to identify structural variants in transcriptome assemblies. In a nutshell, the algorithm infers variants from non-contiguous (split or gapped) contig sequence alignments to the reference genome. With the aid of gene-model annotation(s), diversified classes of variants such as gene fusions, read-throughs, internal and partial tandem duplications, indels and novel splice variants are classified.

The program is usually preceded by de novo assembly of RNAseq sequences followed by alignment to the reference genome. As such, a pipeline that bundles the 3 analysis steps called TAP (Transabyss-Alignment-PAVFinder) is provided as a standalone application. TAP can also be run in targeted mode on selected genes. This requires a Bloom Filter of target gene sequences to be created beforehand. Whereas the full assembly of a RNAseq library with over 100 million read pairs requires more than 24 hours to complete, a targeted assembly and analysis of a gene list (e.g. COSMIC) of several hundred can be completed within half an hour.

Requirements

  1. External software

  2. Reference files

    • single reference genome FASTA indexed by samtools faidx and GMAP
    • gene model(s) in VCF format with chromosome names matching reference genome

See INSTALL for more details

Usage

  1. Run PVT (for structural variants)

    find_sv.py --gbam <contigs_to_genome_bam> --tbam <contigs_to_transcripts_bam> --transcripts_fasta <indexed_transcripts_fasta> --genome_index <GMAP index genome directory and name> --r2c <reads_to_contigs_bam> <contigs_fasta> <gtf> <genome_fasta> <outdir>
    
  2. Run TAP

    tap.py <sample> <outdir> --bf <target_genes.bf> --fq_list <file_listing_FASTQ_pairs> --k <space-delimited k values> --readlen <read_length>  --nprocs <number_of_processes> --params <parameters_file>
    

See USAGE for more details