Skip to content
/ GBS Public

Group B Strep pipelines for large dataset gene analysis from Illumina sequencing.

License

Notifications You must be signed in to change notification settings

darakibr/GBS

Repository files navigation

GBS data analysis codes

Python and Bash scripts to run data analysis on Gene data

These are some of the scripts I have written to perform data analysis on our Group B Streptococcus agalactiae. I am currently adapting these to be usable for any further bacterial gene analysis our lab might require.

  1. These scripts use several known genetic tools, including:
    • Strainseeker (similar genetic match)
    • SPAdes (contig assembly)
    • Rapberry (read quality)
    • SRST2 (MLST and Capsular Serotype calling)
    • RGI (AMR calling)
    • RAxML (Phylogenetic tree building)
    • Hierbaps (Statistical Grouping of samples by SNP differences)
    • CFML (Phylogenetic tree building)
    • Bacdating (Historical phylogenetic inference)

Template files can be changed to run gene analysis on new datasets

Templates based on .R script will be changed by the template_treepipe.sh script

template_denovopipe.sh

  1. Will take fastq.gz (gzip compressed) files and create:
    • Multi Locus Sequence Type files Capsular Serotype
    • Antimicrobial resistance gene detection
    • Corrected fastq files (gzip compressed)
    • de-novo .fasta files

template_treepipe.sh

  1. Will take a selection of isolates fastq files, uses a REFERENCE file and an EXCLUSION file, and create:
    • VCF Type files against a reference genome to call SNPs
    • SNP log file with all positions marked
    • Pseudo genomes from SNP and complete REFERENCE genome
    • Hierbaps grouping using SNP (overly simplistically: grouping by distance)
    • Phylogenetic tree file using RAxML
    • Clonal Frame ML from RAxML tree
    • Bacdating from CFML output
    *DEPENDENT on template_bacdate.R and template_hierbaps.R

CCisolatetotxt.py

This was used to create several text files to anotate trees in iTOL (phylogenetic tree visiualization website), and create some graphs using seaborn and matplotlib for internal use. This has not been cleaned to and changed to a template format yet, but I am working on it.

About

Group B Strep pipelines for large dataset gene analysis from Illumina sequencing.

Topics

Resources

License

Stars

Watchers

Forks