Skip to content
Arun Durvasula edited this page May 9, 2019 · 3 revisions

Question: How do I cite ANGSD-wrapper?

Answer: Cite the recently published [ANGSD-wrapper paper]((http://onlinelibrary.wiley.com/wol1/doi/10.1111/1755-0998.12578/abstract). If you can't access the full paper, a preprint version is here. Depending on the methods used in ANGSD-wrapper, you should also cite the ANGSD paper and the papers detailing methods. See our References for details.

Question: Is there an example dataset I can use with ANGSD-wrapper?

Answer: A small dataset consisting of BAM files for 11 maize and 11 teosinte inbred lines is available on figshare. A sample of Tripsacum dactyloides that can be used as an outgroup is also provided. We highly recommend going through the tutorial, which walks you through analyzing this data.

Question: Will ANGSD-wrapper be updated on a regular basis?

Answer: Yes! ANGSD-wrapper is being updated to work with the latest version of ANGSD.

Question: Can I use my own version of ANGSD?

Answer: No, you should use the version of ANGSD included with ANGSD-wrapper.

Question: When I try to compile ANGSD, the following error is reported: /usr/bin/ld cannot find -lz

Answer: Generally this error appears when your system doesn't have the required version of zlib. If you are working on a computing cluster, contact your system administrator.

Question: What kind of runtimes can I expect with ANGSD-wrapper?

Answer: Run time will depend on several factors including (but not limited to): RAM, CPU, sample size, read depth, genome size. In general, genome scale analyses will take substantial computing power and time (on the order of days) and are generally best performed on computer clusters. This is especially true of higher coverage data, which will take even longer. Because the example dataset is a subset of a larger sample, it should run on the order of a few hours.

Question: How do I call SNPs using ANGSD and ANGSD-wrapper?

Answer: ANGSD was designed to perform most population genetic analyses without the need to make SNP calls. That is, variants are dealt with probabilistically and many common descriptors of polymorphism in a population can be calculated without SNP calls. SNP calling is possible in ANGSD and with the help of ANGSD-wrapper. See the tutorial for details.

Question: Where can I find the step-by-step process for running ANGSD-wrapper?

Answer: This set of tutorials will demonstrate the most common analyses conducted with ANGSD and ANGSD-wrapper.

Question: Do I need an outgroup sequence to use ANGSD and ANGSD-wrapper?

Answer: It is necessary to have an outgroup sequence to infer the ancestral state of the mutations and to estimate the derived site frequency spectrum. However, many of the descriptive statistics calculated by ANGSD are not dependent on the ancestral state of mutations. If no outgroup sequence is available, include the path to the reference sequence when asked for the outgroup.

Question: Do I need to include inbreeding coefficients in my analyses?

Answer: Yes, ANGSD-wrapper requires the input of inbreeding coefficients in most analyses. If you prefer not to incorporate inbreeding, create a file with 0 on a new line for each individual in your population(s), i.e., assume complete outcrossing, and link to this file in the required places.

Question: ANGSD complained about not being able to find a chromosome even though it's in my reference sequence.

Answer: This issue is caused by out of order chromosomes in your reference. You can fix this in the following way:

for i in `seq 1 10`; do samtools faidx ref.fa $i >> sorted_ref.fa; done

Where 10 is the number of chromosomes. Then gzip your fasta and run

samtools faidx sorted_ref.fa.gz

to get your .fai file.

Question: I need to add an option that isn't supported by ANGSD-wrapper. How do I do this?

Answer: This is pretty straightforward to do. ANGSD-wrapper is essentially a bunch of bash scripts so you just need to modify the corresponding script. For example, if you want to add SNP calling to your ABBABABA analysis, open the Wrappers/Abbababa.sh file in your favorite text editor. Scroll down to the ANGSD call ("${ANGSD_DIR}"/angsd \) and add in the options as you would on a command line with a \ separating new lines. For example:

    -doMaf 2 \
    -doMajorMinor 1 \
    -GL 1 \
    -SNP_pval 1e-6

Now you can run the script as you normally would ./angsd-wrapper Abbababa Configuration_Files/Abbababa_Config and it will incorporate the changes. Note that in many scripts, there are several calls to ANGSD. These correspond to whether or not you are using a regions file, defining regions, or have left the regions blank. You can choose to modify all ANGSD calls or just the one you are using.

Question: ABBA-BABA is segfaulting. What's wrong?

Answer: It could be that your FASTA index (.fai) is in a different order than your regions file. Try looking at your .fai (less) and change your regions file to match it.

Question: What value should I use for MIN_MAPQ?

Answer: Our testing in maize suggests that 30 provides a good value for minimizing false positives, but you may lower the value to 20 depending on the organism you are working with.

Question: I have issues running angsd-wrapper shiny graphing

Answer: Please check this issue where there was an issue with Jupyter notebooks and the Conda-R-essentials program.