Skip to content

Published and unpublished code for working with DNA sequence.

License

Notifications You must be signed in to change notification settings

bensassonlab/scripts

Repository files navigation

scripts

annotateCENs.pl

Citation: Bensasson, D. “Evidence for a High Mutation Rate at Rapidly Evolving Yeast Centromeres.” BMC Evol Biol 11 (2011): 211. doi:10.1186/1471-2148-11-211.

A simple perl script to annotate Saccharomyces centromeres: This script uses a regular expression to recognise the consensus sequence motifs for CDEI and CDEIII described in Baker and Rogers 2005, Genetics 171(4):1463-1475.

fourgamete.pl

Citation: Bensasson, D. “Evidence for a High Mutation Rate at Rapidly Evolving Yeast Centromeres.” BMC Evol Biol 11 (2011): 211. doi:10.1186/1471-2148-11-211.

A perl script implementing the four-gamete test for recombination described in Hudson and Kaplan (1985) Genetics 111(1)147-164. This test will not tell you whether or not there is recombination in your data. The script only outputs a summary of the alleles at each site, an alignment of only the segregating sites for easier visualisation of pairs of sites, and a list of the pairs of sites detected by the four-gamete test. In cases, where there are only 2 alleles at each site, these are cases where all 4 possible pairs of SNPs occur at the two sites. This implies either homoplasy (convergent mutation) or a recombination somewhere between the 2 sites.

e.g. this combination can only be explained by homoplasy or recombination and fourgamete.pl would report positions 1 and 37 as a pair of sites detected by the four-gamete test

         position1 position37
taxon1   T         G
taxon2   T         A
taxon3   G         A
taxon4   G         G

For DNA sequence data, homoplasy is a likely explanation for many of these sites. This script does not summarise the data or tell you how many recombination events there are or where.

vcf2allelePlot.pl

Citation: Bensasson, D., Dicks, J., Ludwig, J.M., Bond, C.J., Elliston, A., Roberts, I.N., James, S.A., 2018. Diverse lineages of Candida albicans live on old oaks. bioRxiv 341032. https://doi.org/10.1101/341032

A perl script that uses a vcf file generated by samtools to draw B-allele frequency plots in R.

Scripts used with STRUCTURE

Citation: Tilakaratna, V., Bensasson, D., 2017. Habitat Predicts Levels of Genetic Admixture inSaccharomyces cerevisiae. G3 (Bethesda) 7, 2919–2929. https://doi.org/10.1534/g3.117.041806

structureInfile.pl

Converts DNA sequence alignments in fasta format to STRUCTURE input files that summarises bases at variable sites.

structureShell.pl

Runs STRUCTURE one time for each value of K in a specified range (e.g. from 1 to 10).

structurePrint.pl

Plots STRUCTURE results as barplots using R with user control of colors.

fastatools/

This directory contains scripts used to manipulate DNA sequence in fasta format:

alcat.pl

Citation: Tilakaratna, V., Bensasson, D., 2017. Habitat Predicts Levels of Genetic Admixture inSaccharomyces cerevisiae. G3 (Bethesda) 7, 2919–2929. https://doi.org/10.1534/g3.117.041806

A perl script that concatenates multiple alignment files in fasta format into a single large alignment file

fa2phylip.pl

This perl script converts fasta format sequence into alignments in phylip format.

faChoose.pl

A perl script to choose a subset of sequences from a fasta file. Currently, the script searches for the names provided by the user in a way that is case insensitive, and the pattern can be matched anywhere in the first word of the fasta name descriptor line.

fastaLength.pl

A perl script to summarise the length of DNA sequences in a fasta file. The -g option is useful for showing the ungapped length of DNA sequences in an alignment.

faChrompaint.pl

Citation: Bensasson, D., Dicks, J., Ludwig, J.M., Bond, C.J., Elliston, A., Roberts, I.N., James, S.A., 2018. Diverse lineages of Candida albicans live on old oaks. bioRxiv 341032. https://doi.org/10.1101/341032

A perl script that uses alignment(s) in fasta format to identify the most similar sequences to a study strain in sliding windows. It will optionally produce plots of chromosomes/alignments in R that are colored according to similarity to a panel of reference clades. See example data (in faChrompaintData/) and use in https://doi.org/10.1101/341032.