Skip to content
Tyler Kent edited this page Sep 16, 2015 · 3 revisions

ANSGD Thetas (diversity stats) calculation. See ANGSD for full details on this method.

Before running this script

Note: This is a piggybacked method

Make sure you have run SFS, the output of which will be used as input for this one i.e. you should run:

bash ./scripts/SFS.sh ./scripts/SFS_TAXON.conf
bash ./scripts/THETAS.sh ./scripts/THETAS_TAXON.conf

with the proper taxon name filled in. See the method page for details on running SFS.

Input files

Scripts

  • Script filename: THETAS.sh
  • example config file: THETAS_TAXON.conf

Necessary input files

  • results/TAXON_DerivedSFS output from SFS
  • results/TAXON_SFSOut.mafs.gz output from SFS
  • results/TAXON_SFSOut.saf output from SFS
  • results/TAXON_SFSOut.saf.pos.gz output from SFS
  • data/TAXON_samples.txt bam list
  • data/TAXON_F.txt inbreeding coefficients

Output files

  • results/TAXON_Diversity.thetas.gz diversity stats

Mandatory THETAS_TAXON.conf Variables

  • DO_SAF create SFS (default=2)
  • UNIQUE_ONLY uniquely mapped reads (default=1)
  • MIN_BASEQUAL minimum base quality (default=20)
  • BAQ adjust qscores around indels (as SAMtools) (default=1)
  • MIN_IND minimum number of individuals needed to use site (default=1)
  • GT_LIKELIHOOD estimate genotype likelihoods (default=2)
  • MIN_MAPQ minimum base mapping quality to use (default=30)
  • N_CORES number of cores to use (default=32)
  • DO_MAJORMINOR estimate major/minor alleles (default=1)
  • DO_MAF calculate per site frequencies (default=1)
  • DO_THETAS calculate diversity stats (default=1)
  • OVERRIDE this variable will redo analyses. Set to false if you want to skip (default=false)
  • SLIDING_WINDOW this variable, when set to true will enable sliding window analysis (default=false)
  • WIN window size for sliding window analysis (default=50000)
  • STEP step size for sliding window analysis (default=10000)

Optional THETAS_TAXON.conf Variables:

  • UNIX_USER this variable fills in absolute paths for the rest of the config file and script. It should match the name of the user's home directory.
  • PROJECT_DIR absolute path to the location of the analysis folder
  • ANGSD_DIR=${PROJECT_DIR}/angsd
  • ANC_SEQ the path to the ancestral sequence file
  • REF_SEQ the path to the reference sequence file
  • TAXON the name of the data being analyzed. The script will look for files in the data directory with this name. These files include: ${TAXON}_samples.txt and ${TAXON}_F.txt. If these files are not present, the script will not work correctly. ${TAXON}_samples.txt contains a list of paths to BAM files. Check the data folder for an example. ${TAXON}_F.txt contains inbreeding coefficients for each of these samples. Check the data folder for an example.