These are some of the scripts I have written to perform data analysis on our Group B Streptococcus agalactiae. I am currently adapting these to be usable for any further bacterial gene analysis our lab might require.
- These scripts use several known genetic tools, including:
- Strainseeker (similar genetic match)
- SPAdes (contig assembly)
- Rapberry (read quality)
- SRST2 (MLST and Capsular Serotype calling)
- RGI (AMR calling)
- RAxML (Phylogenetic tree building)
- Hierbaps (Statistical Grouping of samples by SNP differences)
- CFML (Phylogenetic tree building)
- Bacdating (Historical phylogenetic inference)
- Will take fastq.gz (gzip compressed) files and create:
- Multi Locus Sequence Type files Capsular Serotype
- Antimicrobial resistance gene detection
- Corrected fastq files (gzip compressed)
- de-novo .fasta files
- Will take a selection of isolates fastq files, uses a REFERENCE file and an EXCLUSION file, and create:
- VCF Type files against a reference genome to call SNPs
- SNP log file with all positions marked
- Pseudo genomes from SNP and complete REFERENCE genome
- Hierbaps grouping using SNP (overly simplistically: grouping by distance)
- Phylogenetic tree file using RAxML
- Clonal Frame ML from RAxML tree
- Bacdating from CFML output
This was used to create several text files to anotate trees in iTOL (phylogenetic tree visiualization website), and create some graphs using seaborn and matplotlib for internal use. This has not been cleaned to and changed to a template format yet, but I am working on it.