Skip to content

mptp_exercise

Pas-Kapli edited this page Oct 16, 2016 · 20 revisions

Introduction (m)ptp

PTP takes as input a phylogenetic tree and tries to determine the transition point from a between- to a within-species process. The single rate PTP assumes that the branch lengths of the phylogenetic tree can be split in two clusters, fitting two distinct exponential distributions; one corresponding to the speciation and one to the coalescent process. The newly introduced multi rate PTP (mPTP) assumes that the branch lengths of each putative species may form different clusters and therefore it assumes each may fit a distinct exponential distribution.

Commands for installing mptp

$ git clone https://github.com/Pas-Kapli/mptp.git
$ cd mptp
$ ./autogen.sh
$ ./configure
$ make
$ sudo make install

Exercise

Infer alignment:

$ mafft BR_cob_57ind.fasta > BR_cob_57ind_mafft.fasta

mafft BR_cob_57ind.fasta

BR_cob_57ind_mafft.fasta

Note: Need to be much more careful with the alignment when using datasets with many indels (e.g. 16S rRNA)

Remove identical sequences:

$ raxmlHPC-PTHREADS-SSE3 -f c -s BR_cob_57ind_mafft.fasta -m GTRGAMMA -n read_alignment

On the screen and the RAxML_info.read_alignment file there is the following information

Found 17 sequences that are exactly identical to other sequences in the alignment.
Normally they should be excluded from the analysis.```	

A reduced file is automatically created: [BR_cob_57ind_mafft.fasta.reduced](https://raw.githubusercontent.com/Pas-Kapli/assets/master/tutorials/Norway2016/data/Branchiomma/ptp/BR_cob_57ind_mafft.fasta.reduced) in phylip format 

Check the difference: 

```bash
$ head -n 1 BR_cob_57ind_mafft.fasta.reduced
$ grep ">"
$ grep ">" BR_cob_57ind_mafft.fasta | wc -l 

Q: How many sequences were in the fasta file and how many are there in the reduced phylip file?

Convert phylip to fasta (we will need it for mptp):

$ phylip_to_fasta.py BR_cob_57ind_mafft.fasta.reduced

The output file will be named BR_cob_57ind_mafft.fasta.reduced.fasta

Infer phylogenetic tree with RAxML:

 $ raxmlHPC-PTHREADS-SSE3 -s BR_cob_57ind_mafft.fasta.reduced -m GTRGAMMA -n Branchiomma -p $RANDOM -T 2 -o BR_076,BR_018 

Using the -o argument we retrieve a rooted phylogeny, if not then we can root it with mptp later

 $ mkdir raxml multi single
 $ mv RAxML* raxml

"Species" delimitation with (m)ptp:

Check the mptp options:

$ cd multi
$ mptp --help

Detect minimum branch length:

 $ mptp --tree_file ../raxml/RAxML_bestTree.Branchiomma --minbr_auto ../../BR_cob_57ind.reduced.fasta -output_file minbr 

Delimitation with the multi-rate algorithm

If the phylogeny is unrooted use the --outgroup taxon_name1,taxon_name2 for rooting the phylogeny prior to the delimitation

 $ mptp --ml --multi --tree_file ../raxml/RAxML_bestTree.Branchiomma --output_file Branchiomma_mptp_multi --outgroup BR_076,BR_018 --outgroup_crop --minbr 0.0022718068 

Check the output files:

$ firefox Branchiomma_mptp_multi.svg

$ nano Branchiomma_mptp_multi.txt 

Branchiomma_mptp_multi.svg Branchiomma_mptp_multi.txt

Repeat the exercise with the single-rate algorithm using the --single instead of the --multi option

Support values:

Perform MCMC sampling for the multi and the single rate PTP:

$ mptp --tree_file ../raxml/RAxML_bestTree.Branchiomma --minbr 0.0022718068 --mcmc 10000000 --mcmc_log --mcmc_sample 50000 --multi --mcmc_runs 2 --output_file mcmc_multi --outgroup BR_076,BR_018 --outgroup_crop 

Output files

tree file with support values in svg format (mcmc_multi.1476550918.combined.svg) and in newick format (mcmc_multi.1476550918.combined.tree)

Plot of the likelihood per mcmc step for the multi rate algorithm

Q: Did the two runs converge to the same score/solution?

Clone this wiki locally