-
Notifications
You must be signed in to change notification settings - Fork 2
gmyc_exercise
The Generalized Mixed Yule Coalescent (GMYC) model is a very popular methods for single locus species delimitation that was initially introduced in 2006 (Pons et al., 2006). The GMYC method (Fujisawa and Barraclough, 2013) uses a speciation (Yule, 1925) and a neutral coalescent model (Hudson, 1990). It strives to maximize the likelihood score by separating/classifying the branches of an ultrametric tree (in units of absolute or relative ages) into two processes; within and between species.
Input: Binary (fully resolved), ultrametric (all tips are equally distant from the root) tree
Software: R package "splits" and web-service
install "splits" :
launch R
$ R install dependencies:
>install.packages("ape", dependencies=T)
>install.packages("MASS", dependencies=T)
>install.packages("paran", dependencies=T)or
> install.packages(c("ape", "paran", "rncl"))install splits:
> install.packages("splits", repos="http://R-Forge.R-project.org")We will also need:
$ mafft BR_cob_57ind.fasta > BR_cob_57ind_mafft.fasta$ raxmlHPC-PTHREADS-SSE3 -f c -s Cyanea.muscle.fasta -m GTRGAMMA -n read_alignmentA reduced file is automatically created: Cyanea.muscle.fasta.reduced in phylip format
Check the difference:
$ head -n 1 BR_cob_57ind_mafft.fasta.reduced
$ grep ">" BR_cob_57ind.reduced.fasta | wc -lQ: How many sequences were in the fasta file and how many are there in the reduced phylip file?
Convert phylip to fasta:
$ phylip_to_fasta.py BR_cob_57ind_mafft.fasta.reducedThe output file will be named BR_cob_57ind_mafft.fasta.reduced.fasta
There are many methods to convert a phylogenetic tree (i.e., branch lengths represent expected number of substitutions per site) to ultrametric (i.e., branch lengths represent time) such as PATHd8 (Britton et al., 2007), r8s (Sanderson, 2003) and methods that estimate the ultrametric tree from the alignment such as Beast (Drummond & Rambaut, 2007). Empirical data suggest that GMYC provides the best results when combined with Beast ultrametric trees (Tang et al., 2014).
Start Beauti
$ path_to_beast_folder/BEASTv1.8.2/bin/beautiFile -> Import Data and select the fasta file BR_cob_57ind_mafft.fasta.reduced.fasta
Tab "Sites": Set the substitution model to HKY + Gamma (Site Heterogeneity Model) in the ideal case you should run a model selection (e.g. with jModelTest) analysis to decide which of the available models is the most fitting for your dataset
Tab "Clocks": For the exercise we will first use the "Clock model: Strict"
The "Lognormal relaxed clock" assumes that the evolutionary rate may vary across the tree according to a log-normal distribution. This option is suitable when we expect that many species are involved in the phylogeny that may evolve at different rates. However, if the dataset includes only few and closely related species the strict clock might be a good option too. This model is simpler and assumes that the substitution rate is constant across the entire tree.
Tab "Trees": For the exercise we will use the "Speciation: Yule process"
The tree prior expresses our assumption on the branching process through time for a group of organisms. The two main options is whether to use a speciation or a coalescent model. The speciation models (pure birth and birth-death models) are more approapriate for datasets involving many species while the coalescent model are intended for datasets involving one or perhaps few very closely related species.
The GMYC method uses the branching times to identify the speciation and the coalescent processes. Therefore, the clock and the tree prior are both very important parameters in getting accurate delimitation results.
Tab "MCMC": Set the chain length to 10000000, and change the file name to "BR_cob_57ind.reduced_Strict"
Once everything is ready click the button on the down right corner "Generate Beast File". The pop-up window notifies us which priors are set to their default values, click "Continue" and save the "BR_cob_57ind.reduced_Strict.xml" file to a new folder named Strict
*** Check the effect of the clock prior by performing one more run under the relaxed log-normal clock prior. Save the xml in a new folder called LN with the name BR_cob_57ind.reduced_LN.xml***
If you have problems creating the Beauti files you can download them from here: BR_cob_57ind.reduced_Strict.xml, BR_cob_57ind.reduced_LN.xml
$ path_to_beast_folder/BEASTv1.8.2/bin/beast -beagle_off BR_cob_57ind.reduced_Strict.xmlomit the -beagle_off option if the beagle library is available
If the analysis is running for too long, you can download the output files for all the tree and clock prior combinations here
$ path_to_tracer/Tracer_v1.6/bin/tracer BR_cob_57ind.reduced_Strict.logIn a good run the ESS values will be high (> 200) and the trace plot for all the parameters should show that a certain plateau was reached
$ ~/SOFTWARE/BEASTv1.8.2/bin/treeannotator -heights median -burninTrees 1000 BR_cob_57ind.reduced_Strict.trees BR_cob_57ind.reduced_Strict.tre!!!!!!!!!!!!!link to BR_cob_57ind.reduced_Strict.tre BR_cob_57ind.reduced_LN.tre!!!!!!!!!!!
Check the output tree with figtree:
$ figtree BR_cob_57ind.reduced_Strict.tre$ R> library(splits)load tree
> tr_strict <- read.nexus("BR_cob_57ind.reduced_Strict.tre")plot tree
> plot(tr_strict)estimate the number of species:
> res_strict <- gmyc(tr_strict)check the results with:
> summary(res_strict)
Result of GMYC species delimitation
method: single
likelihood of null model: 185.4816
maximum likelihood of GMYC model: 202.3221
likelihood ratio: 33.68108
result of LR test: 4.855644e-08***
number of ML clusters: 9
confidence interval: 7-9
number of ML entities: 15
confidence interval: 12-16
threshold time: -0.01305178Check the species assignment:
> spec.list(res_strict)Visualize the results with plots:
> plot(res_strict)SHOW EXAMPLE FIGURES
Repeat the delimitation with the Perform the GMYC species delimiation with the BR_cob_57ind.reduced_LN.tre
Q: Does the delimitation change? Why?