-
Notifications
You must be signed in to change notification settings - Fork 12
Thetas
Tyler Kent edited this page Sep 16, 2015
·
3 revisions
ANSGD Thetas (diversity stats) calculation. See ANGSD for full details on this method.
Note: This is a piggybacked method
Make sure you have run SFS, the output of which will be used as input for this one i.e. you should run:
bash ./scripts/SFS.sh ./scripts/SFS_TAXON.conf
bash ./scripts/THETAS.sh ./scripts/THETAS_TAXON.conf
with the proper taxon name filled in. See the method page for details on running SFS.
- Script filename:
THETAS.sh
- example config file:
THETAS_TAXON.conf
-
results/TAXON_DerivedSFS
output from SFS -
results/TAXON_SFSOut.mafs.gz
output from SFS -
results/TAXON_SFSOut.saf
output from SFS -
results/TAXON_SFSOut.saf.pos.gz
output from SFS -
data/TAXON_samples.txt
bam list -
data/TAXON_F.txt
inbreeding coefficients
-
results/TAXON_Diversity.thetas.gz
diversity stats
-
DO_SAF
create SFS (default=2) -
UNIQUE_ONLY
uniquely mapped reads (default=1) -
MIN_BASEQUAL
minimum base quality (default=20) -
BAQ
adjust qscores around indels (as SAMtools) (default=1) -
MIN_IND
minimum number of individuals needed to use site (default=1) -
GT_LIKELIHOOD
estimate genotype likelihoods (default=2) -
MIN_MAPQ
minimum base mapping quality to use (default=30) -
N_CORES
number of cores to use (default=32) -
DO_MAJORMINOR
estimate major/minor alleles (default=1) -
DO_MAF
calculate per site frequencies (default=1) -
DO_THETAS
calculate diversity stats (default=1) -
OVERRIDE
this variable will redo analyses. Set to false if you want to skip (default=false) -
SLIDING_WINDOW
this variable, when set to true will enable sliding window analysis (default=false) -
WIN
window size for sliding window analysis (default=50000) -
STEP
step size for sliding window analysis (default=10000)
-
UNIX_USER
this variable fills in absolute paths for the rest of the config file and script. It should match the name of the user's home directory. -
PROJECT_DIR
absolute path to the location of the analysis folder -
ANGSD_DIR
=${PROJECT_DIR}/angsd
-
ANC_SEQ
the path to the ancestral sequence file -
REF_SEQ
the path to the reference sequence file -
TAXON
the name of the data being analyzed. The script will look for files in the data directory with this name. These files include:${TAXON}_samples.txt
and${TAXON}_F.txt
. If these files are not present, the script will not work correctly.${TAXON}_samples.txt
contains a list of paths to BAM files. Check the data folder for an example.${TAXON}_F.txt
contains inbreeding coefficients for each of these samples. Check the data folder for an example.