Skip to content

Nucleotide diversity with GPAT

jewmanchue edited this page Jun 9, 2014 · 12 revisions

This method calculates pi and EHH for bi-allelic SNPs. The result is dependent on window size. With larger windows fewer haplotypes are shared resulting in higher diversity. Choose a window size carefully.

Usage statement for sequenceDiversity:

INFO: help
INFO: description:
      The sequenceDiversity program calculates two popular metrics of  haplotype diversity: pi and
      extended haplotype homozygoisty (eHH).  Pi is calculated using the Nei and Li 1979 formulation.
      eHH a convenient way to think about haplotype diversity.  When eHH = 0 all haplotypes in the window
      are unique and when eHH = 1 all haplotypes in the window are identical. The window size is 20 SNPs.
Output : 5 columns:
         1.  seqid
         2.  start of window
         3.  end of window
         4.  pi
         5.  eHH


INFO: usage: sequenceDiversity --target 0,1,2,3,4,5,6,7 --file my.vcf

INFO: required: t,target     -- argument: a zero base comma seperated list of target individuals corrisponding to VCF columns
INFO: required: f,file       -- argument: a properly formatted phased VCF file
INFO: required: y,type       -- argument: type of genotype likelihood: PL, GL or GP
INFO: optional; r,region     -- argumetn: a tabix compliant region : "seqid:0-100" or "seqid"

INFO: version 1.1.0 ; date: April 2014 ; author: Zev Kronenberg; email : zev.kronenberg@utah.edu

Running provided example:

bin/sequenceDiversity --file samples/scaffold612.phased.vcf.gz --type GP --target 1,20,25,29,30,38,43,46 > pi-ehh.scaffold612"