Skip to content

TheDBStern/HPV-genotyping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HPV-genotyping

--David B. Stern, Ph.D.--

The pipeline was developed to process Illumina sequencing data generated from CD Genomic's HPV Capture Kit.

Usage

The scripts directory contains bash and R scripts to process the data.

The ref directory contains a fasta file of papillomavirus reference genomes from PaVE, including non-reference genomes, borrowed from HPV-EM which has nicely reformatted names. This reference fasta needs to be indexed by Bowtie 2

All scripts rely on a file called files.txt which contains the names of all the samples to be processed.

  • clean_map_abundance.sh: UGE array script to clean reads with bbduk, map reads to the reference genomes with Bowtie 2, and estimate the relative abundance of each genotype using msamtools.

  • stats.sh: Collects coverage and pairwise ID statistics from the bam files using awk and NanoStat, and generates coverage plots using collect_stats.R. Should be run from the directory with the bam files. Be sure to check paths for reference fasta, index, and collect_stats.R

  • collect_stats.R: R script to generate table of statistics and coverage plots. Run automatically with stats.sh. Requires the tidyverse R package.

  • merge_msamtools_stats.R: R script to merge the output of msamtools and collect_stats.R for each sample.

About

Collection of scripts and commands for analyzing Illumina-sequence-based HPV genotyping data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published