HPV-genotyping

--David B. Stern, Ph.D.--

The pipeline was developed to process Illumina sequencing data generated from CD Genomic's HPV Capture Kit.

Usage

The scripts directory contains bash and R scripts to process the data.

The ref directory contains a fasta file of papillomavirus reference genomes from PaVE, including non-reference genomes, borrowed from HPV-EM which has nicely reformatted names. This reference fasta needs to be indexed by Bowtie 2

All scripts rely on a file called files.txt which contains the names of all the samples to be processed.

clean_map_abundance.sh: UGE array script to clean reads with bbduk, map reads to the reference genomes with Bowtie 2, and estimate the relative abundance of each genotype using msamtools.
stats.sh: Collects coverage and pairwise ID statistics from the bam files using awk and NanoStat, and generates coverage plots using collect_stats.R. Should be run from the directory with the bam files. Be sure to check paths for reference fasta, index, and collect_stats.R
collect_stats.R: R script to generate table of statistics and coverage plots. Run automatically with stats.sh. Requires the tidyverse R package.
merge_msamtools_stats.R: R script to merge the output of msamtools and collect_stats.R for each sample.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ref		ref
scripts		scripts
LICENSE		LICENSE
README.md		README.md
files.txt		files.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref

ref

scripts

scripts

LICENSE

LICENSE

README.md

README.md

files.txt

files.txt

Repository files navigation

HPV-genotyping

Usage

About

Releases

Packages

Languages

License

TheDBStern/HPV-genotyping

Folders and files

Latest commit

History

Repository files navigation

HPV-genotyping

Usage

About

Resources

License

Stars

Watchers

Forks

Languages