Skip to content
ejodude edited this page Oct 29, 2014 · 7 revisions

##ABBA-BABA

The ABBA-BABA test takes a predefined phylogeny of four taxa supplied by the user and searches for variant sites that either conform to an ABBA or BABA inheritance pattern, where "A" represents ancestral and "B" derived allele states. Under the null model of incomplete lineage sorting and a lack of gene flow between between populations, we expect ABBA and BABA sites to occur with equal frequency. Finding an excess of either pattern can indicate gene flow between the two taxa with an excess of shared derived alleles.

These patterns can then be combined using window based approaches to calculate Patterson's D statistic, which acts to measure an excess of shared derived alleles in either the ABBA (positive D-stat values) or BABA (negative D-stat values) tree topologies. Under the null assumption, Patterson's D statistic should be zero.

Usage for abba-baba:

INFO: help
INFO: description:
     abba-baba calculates the tree pattern for four indviduals.
     This tool assumes reference is ancestral and ignores non abba-baba sites.
     The output is a boolian value: 1 = true , 0 = false for abba and baba.
     the tree argument should be specified from the most basal taxa to the most derived.

     Example:
     D   C  B   A
     \  / /    /
      \  /    /
       \    /
        \  /
         /
        /
 --tree A,B,C,D

Output : 4 columns :
     1. seqid
     2. position
     3. abba
     4. baba
INFO: usage:  abba-baba-zabba --tree 0,1,2,3 --file my.vcf --type PL

INFO: required: t,tree       -- a zero based comma seperated list of target individuals corrisponding to VCF columns
INFO: required: f,file       -- a properly formatted VCF.
INFO: required: y,type       -- genotype likelihood format ; genotypes: GP,GL or PL;

INFO: version 1.0.0 ; date: April 2014 ; author: Zev Kronenberg & EJ Osborne; email : zev.kronenberg@utah.edu 

Running provided example:

We first start by finding positions across the vcf file that have ABBA vs BABA inheritance patterns according to the user specified phylogeny with the --tree argument. After this, we can run the smoother function to calculate the Patternson's D statistic in specified window sizes across the region. Last, visualization is done with the plotSmoothed.R script.

WARNING code blocks scroll horizontally

cd samples/
../bin/abba-baba --tree 0,1,2,3 --file scaffold612.vcf --type PL > scaffold612.abba-baba.txt
../bin/smoother --format abba-baba --file scaffold612.abba-baba.txt -w 10000 > scaffold612.d-stat.10kb.txt
R --vanilla < ../bin/plotSmoothed.R --args scaffold612.d-stat.10kb.txt abba-baba

The resulting output:

with-counts