-
Notifications
You must be signed in to change notification settings - Fork 2
gmyc_exercise
The Generalized Mixed Yule Coalescent (GMYC) model is a very popular methods for single locus species delimitation that was initially introduced in 2006 (Pons et al., 2006). The GMYC method (Fujisawa and Barraclough, 2013) uses a speciation (Yule, 1925) and a neutral coalescent model (Hudson, 1990). It strives to maximize the likelihood score by separating/classifying the branches of an ultrametric tree (in units of absolute or relative ages) into two processes; within and between species.
Input: Binary (fully resolved), ultrametric (all tips are equally distant from the root) tree
Software: R package "splits" and web-service
install "splits" :
launch R
$ R install dependencies:
>install.packages("ape", dependencies=T)
>install.packages("MASS", dependencies=T)
>install.packages("paran", dependencies=T)or
> install.packages(c("ape", "paran", "rncl"))install splits:
> install.packages("splits", repos="http://R-Forge.R-project.org")We will also need:
Working directory: ~/workshop_exercises/distance_methods/branchiomma/gmyc
Note: If you didn't create this directory during the linux tutorial create it now using mkdir
$ mafft BR_cob_57ind.fasta > BR_cob_57ind_mafft.fasta$ raxmlHPC-PTHREADS-SSE3 -f c -s BR_cob_57ind_mafft.fasta -m GTRGAMMA -n read_alignmentA reduced file is automatically created: BR_cob_57ind_mafft.fasta.reduced in phylip format
Check the difference:
$ head -n 1 BR_cob_57ind_mafft.fasta.reduced
$ grep ">" BR_cob_57ind_mafft.fasta | wc -lQ: How many sequences were in the fasta file and how many are there in the reduced phylip file?
Convert phylip to fasta:
$ phylip_to_fasta.py BR_cob_57ind_mafft.fasta.reducedThe output file will be named BR_cob_57ind_mafft.fasta.reduced.fasta, rename it into BR_cob_57ind.reduced.fasta:
$ mv BR_cob_57ind_mafft.fasta.reduced.fasta BR_cob_57ind.reduced.fastaThere are many methods to convert a phylogenetic tree (i.e., branch lengths represent expected number of substitutions per site) to ultrametric (i.e., branch lengths represent time) such as PATHd8 (Britton et al., 2007), r8s (Sanderson, 2003) and methods that estimate the ultrametric tree from the alignment such as Beast (Drummond & Rambaut, 2007). Empirical data suggest that GMYC provides the best results when combined with Beast ultrametric trees (Tang et al., 2014).
Start Beauti
$ ~/phylogeny/BEASTv1.8.2/bin/beautiFile -> Import Data and select the fasta file BR_cob_57ind.reduced.fasta
Tab "Sites": Set the substitution model to HKY + Gamma (Site Heterogeneity Model) in the ideal case you should run a model selection (e.g. with jModelTest) analysis to decide which of the available models is the most fitting for your dataset
Tab "Clocks": For the exercise we will first use the "Clock model: Strict"
The "Lognormal relaxed clock" assumes that the evolutionary rate may vary across the tree according to a log-normal distribution. This option is suitable when we expect that many species are involved in the phylogeny that may evolve at different rates. However, if the dataset includes only few and closely related species the strict clock might be a good option too. This model is simpler and assumes that the substitution rate is constant across the entire tree.
Tab "Trees": For the exercise we will use the "Speciation: Yule process"
The tree prior expresses our assumption on the branching process through time for a group of organisms. The two main options is whether to use a speciation or a coalescent model. The speciation models (pure birth and birth-death models) are more approapriate for datasets involving many species while the coalescent model are intended for datasets involving one or perhaps few very closely related species.
The GMYC method uses the branching times to identify the speciation and the coalescent processes. Therefore, the clock and the tree prior are both very important parameters in getting accurate delimitation results.
Tab "MCMC": Set the chain length to 10000000, and change the file name to "BR_cob_57ind.reduced_Strict"
Once everything is ready click the button on the down right corner "Generate Beast File". The pop-up window notifies us which priors are set to their default values, click "Continue" and save the BR_cob_57ind.reduced_Strict.xml file to a new folder named Strict
Check the effect of the clock prior by performing one more run under the relaxed log-normal clock prior. Save the xml in a new folder called LN with the name BR_cob_57ind.reduced_LN.xml
$ ~/phylogeny/BEASTv1.8.2/bin/beast -beagle_off BR_cob_57ind.reduced_Strict.xmlomit the -beagle_off option if the beagle library is available
$ tracer BR_cob_57ind.reduced_Strict.logBR_cob_57ind.reduced_Strict.log
In a good run the ESS values will be high (> 200) and the trace plot for all the parameters should show that a certain plateau was reached. When this is not happening we could try running the analysis longer, sample more frequently, make sure that are priors are appropriate and that there is not something wrong with our data.
Posterior trees under the strict clock: BR_cob_57ind.reduced_Strict.trees.gz and under the relaxed log-normal: BR_cob_57ind.reduced_LN.trees.gz
Extract the tree files:
$ gzip -d BR_cob_57ind.reduced_Strict.trees.gz$ ~/phylogeny/BEASTv1.8.2/bin/treeannotator -heights median -burninTrees 1000 BR_cob_57ind.reduced_Strict.trees BR_cob_57ind.reduced_Strict.treCheck the output tree with figtree:
$ figtree BR_cob_57ind.reduced_Strict.treAnnotated trees: BR_cob_57ind.reduced_Strict.tre and BR_cob_57ind.reduced_LN.tre
$ R> library(splits)load tree
> tr_strict <- read.nexus("BR_cob_57ind.reduced_Strict.tre")plot tree
> plot(tr_strict)estimate the number of species:
> res_strict <- gmyc(tr_strict)check the results:
> summary(res_strict)
Result of GMYC species delimitation
method: single
likelihood of null model: 185.4816
maximum likelihood of GMYC model: 202.3221
likelihood ratio: 33.68108
result of LR test: 4.855644e-08***
number of ML clusters: 9
confidence interval: 7-9
number of ML entities: 15
confidence interval: 12-16
threshold time: -0.01305178Check the species assignment:
> spec.list(res_strict)
GMYC_spec sample_name
1 1 BR_057
2 1 BR_055
3 2 BR_107
4 2 BR_006
5 2 BR_016
6 2 BR_088
7 2 BR_108
8 3 BR_054
9 3 BR_065
10 3 BR_059
11 4 BR_075
12 4 BR_074
13 5 BR_080
14 5 BR_062
... ... ...
39 14 BR_048
40 15 BR_076
Visualize the results with plots:
> plot(res_strict)Save the plots as a pdf file:
> pdf("gmyc_strict.pdf")
> plot(res_strict)
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
> dev.off()Perform the GMYC species delimiation with the BR_cob_57ind.reduced_LN.tre and the Carabus_mafft_Strict_HKY.tre