Skip to content

acuenod111/UPEC

Repository files navigation

This directory contains scripts which were used to analyse bacterial data included in the study "Bacterial genome wide association study substantiates papGII of E. coli as a patient independent driver of urosepsis". Input files for these scripts can be accessed via the Open Sciene Foundation https://osf.io/vmqc5/.

Vir_res_figures.R

This script combines the output of different tools which were used to characterise E. coli strains. This includes:

The file '825_strains_included.txt' lists one representative strain per clinical case.

plot_pyseer.R

This script was adapted from https://pyseer.readthedocs.io/en/master/tutorial.html#interpreting-significant-k-mers and plots the association and average effect size of all unitigs with the endpoint 'invassive infection'. Two input file is required:

specta_preprocessing-bruker.R / specta_preprocessing-shimadzu.R

These files pick peaks from the raw spectra, which were either generated using a mass spectrometer from Shimadzu (mzXML files) or from Bruker (fid files) These scripts can be run with:

Rscript specta_preprocessing-shimadzu.R ./mzXml-processed_Launchpad ./Shimadzu/csv_median ./poso.tgnr.invas.csv Rscript specta_preprocessing-bruker.R ./fid ./Bruker/csv_median

where /fid and /mzXml-processed_Launchpad indicate the directory to the rawfiles, /csv_median the path to the output files and "poso.tgnr.invas.csv" a file translating between target plate positions and samplenames required for the Shimadzu pre-processing. The peak picking is based on the packages MALDIQuantForeign and MALDIQuant (https://github.com/sgibb/MALDIquant)

binary_from_spectra.R

This script plots the presence / absence of MALDI-TOF MS peaks summarised by phylogroup and papG variant. It requires the following input files:

  • one_per_case_strain_825.txt: A list of all paths to the 828 assemblies
  • pagG_var.csv: papG variants as assigned by the EcVGDB virulence database.
  • mash_phylo.csv: Assignment of each strain to one out of 14 phylogroups. Phylogroups were assigned by calculating Mash distances to 14 reference strains (adapted from https://doi.org/10.1038/s42003-020-01626-5)

PCR_snippy_eval.R

This script evaluates which variants where detected for PCR primers and probes, using the variant caller Freebayes via snippy (https://github.com/tseemann/snippy) It requires the full alignments which where outputted by snippy as input files:

  • gapC.core.full.aln
  • papC.core.full.aln
  • papGII.core.full.aln

mass_from_aa-hdeA.py

This script takes an amino acid multi-sequence alignment of the protein HdeA, removes the signal peptode from the sequences and predicts the molecular masses using protparam.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published