A collection of scripts for determining the haplotype composition of viral populations from sequence data collected from Pacific Biosciences and Illumina sequencing technologies.
- avg_pool_quality.py - Calculates and records the average pool quality at each position in each segment
- requires a directory with BAM files, the reference sequence, and a file containing pool info for each sample
- avg_sample_quality.py - Calculates and records the average sample quality at each position in each segment
- requires a directory with BAM files and the reference sequence
- config.py - File that assigns the parameters required by pacbio_haplotype.py
- convert_consensus.py - converts consensus files so that each file is a sample with all segment consensus sequences
- haplotype_visualization.R - script used to generate plots
- Must manually set the segment (global variable at the top of the script)
- requires the *tidy_haplotypes.csv and *tidier_haplotypes.csv files generated by pacbio_haplotype.py
- illumina_linkage_plots.R - used to generate the linkage plots for illumina data. Each plot represents a single pacbio haplotype from one sample on one day. Top line represents the pacbio haplotype.
- illumina_utility.py - contains functions for trying to do mutation linkage with illumina data
- io_utility.py - helper functions for reading, processing, and writing data
- pacbio_haplotype.py - main functions for haplotype determination
- requires config.py, io_utility.py, process_reads.py, illumina_utility.py
- plot_transmission_pairs.py - script for automating the comparisons between F0 and F1 generations and generating the appropriate plots
- process_reads.py - helper functions for dealing with pacbio or illumina read data
- Clean/organize and document code better
- Develop visualization tool