Skip to content

gmyc_exercise

Pas-Kapli edited this page Sep 24, 2016 · 13 revisions

Intorduction

The Generalized Mixed Yule Coalescent (GMYC) model is a very popular methods for single locus species delimitation that was initially introduced in 2006 (Pons et al., 2006). The GMYC method (Fujisawa and Barraclough, 2013) uses a speciation (Yule, 1925) and a neutral coalescent model (Hudson, 1990). It strives to maximize the likelihood score by separating/classifying the branches of an ultrametric tree (in units of absolute or relative ages) into two processes; within and between species.

Input: Binary (fully resolved), ultrametric (all tips are equally distant from the root) tree

Software: R package "splits" and web-service

Necessary software for this tutorial:

install "splits" :

launch R

$ R

install dependencies:

>install.packages("ape", dependencies=T) >install.packages("MASS", dependencies=T) >install.packages("paran", dependencies=T)

or

> install.packages(c("ape", "paran", "rncl"))

install splits:

> install.packages("splits", repos="http://R-Forge.R-project.org")

We will also need:

BEASTv1.8.3

Figtree

Tracer

1.1. Estimate phylogenetic tree

$ muscle Cyanea.phy > Cyanea.muscle.fasta

1.2. Remove identical sequences:

$ raxmlHPC-PTHREADS-SSE3 -f c -s Cyanea.muscle.fasta -m GTRGAMMA -n read_alignment

A reduced file is automatically created: Cyanea.muscle.fasta.reduced in phylip format

Check the difference:

$ head -n 1 Cyanea.muscle.fasta.reduced

$ grep ">"

grep ">" Cyanea.muscle.fasta | wc -l

Q: How many sequences were in the fasta file and how many are there in the reduced phylip file?

Convert phylip to fasta:

$ phylip_to_fasta.py Cyanea.muscle.fasta.reduced

The output file will be named Cyanea.muscle.fasta.reduced.fasta

1.4. Infer ultrametric tree using Beast v.1.8.2:

There are many methods to convert a phylogenetic tree (i.e., branch lengths represent expected number of substitutions per site) to ultrametric (i.e., branch lengths represent time) such as PATHd8 (Britton et al. 2007), r8s (Sanderson 2003) and methods that estimate the ultrametric tree from the alignment such as Beast (Drummond & Rambaut 2007). Empirical data suggest that GMYC provides the best results when combined with Beast ultrametric trees (Tang et al., 2014).

1.4.1 Prepare Beast file with Beauti:

Start Beauti $ path_to_beast_folder/BEASTv1.8.2/bin/beauti File -> Import Data and select the fasta file Cyanea.muscle.fasta.reduced.fasta

Tab "Sites": Set the substitution model to HKY + Gamma (Site Heterogeneity Model) in the ideal case you should run a model selection (e.g. with jModelTest) analysis to decide which of the available models is the most fitting for your dataset

Tab "Clocks": For the exercise we will use the "Lognormal relaxed clock"

The "Lognormal relaxed clock" assumes that the evolutionary rate may vary across the tree accordign to a lognormal distribution. This option is suitable when we expect that many species are involved in the phylogeny that may evolve at different rates. However, if the dataset includes only few and closely related species the strict clock might be a good option too. This model is simpler and assumes that the substitution rate is constant across the entire tree.

Tab "Trees": For the exercise we will use the "Speciation: Yule process"

The tree prior expresses our assumption on the branching process through time for a group of organisms. The two main options is whether to use a speciation or a coalescent model. The speciation models (pure birth and birth-death models) are more approapriate for datasets involving many species while the coalescent model are intended for datasets involving one or perhaps few very closely related species.

The GMYC method uses the branching times in a try to identify the speciation and the coalescent processes. Therefore, the clock and the tree prior are both very important parameters in getting accurate delimitation results.

**Check the effect of these priors by performing two more runs one under the Yule model and strict clock and one with the Coalescent: Constant size and the relaxed lognormal clock **

Tab "MCMC": Set the chain length to 10000000, and change the file name to "Cyanea.yule_LN"

Once everything is ready click the button on the down right corner "Generate Beast File". The pop-up window notifies us which priors are set to their default values, click "Continue" and save the "Cyanea.yule_LN.xml" file to a new folder named yule_LN

1.4.2 Run Beast

$ path_to_beast_folder/BEASTv1.8.2/bin/beast -beagle_off Cyanea.yule_LN.xml

omit the -beagle_off option if the beagle library is available

If the analysis is running for too long, you can download the output files for all the tree and clock prior combinations here

1.4.3 Check the performance of each run with Tracer:

$ path_to_tracer/Tracer_v1.6/bin/tracer Cyanea.yule_LN.log

in a good run the ESS values will be high (> 200) and the trace plot for all the parameters should show that a certain plateau was reached:

EXAMPLE PICTURE

1.4.4 Calculate the Maximum Clade Credibility Tree with Treeannotator

$ ~/phylogeny/BEASTv1.8.2/bin/treeannotator -heights median -burnin 2500000 Cyanea.yule_LN.trees Cyanea.yule_LN_annot_tree

Check the output tree with figtree: $ figtree Cyanea.yule_LN_annot_tree

1.5. Estimate the number of species with the GMYC method:

$ R

> library(splits)

load tree

> yule_ln <- read.nexus("Cyanea.yule_LN_annot_tree")

plot tree

> plot(yule_ln)

estimate the number of species:

> res_yule <- gmyc(yule_ln)

check the results with:

> summary(res_yule)

SHOW EXAMPLE

Check the species assingment:

> summary(res_yule)

SHOW EXAMPLE

visualize the results with figures:

> plot(res_yule)

SHOW EXAMPLE FIGURES

Perform the GMYC species delimiation with the other three trees we produced with Beast

Clone this wiki locally