diff --git a/README.md b/README.md
index 3962bd0..34f9b2d 100644
--- a/README.md
+++ b/README.md
@@ -38,7 +38,6 @@ Fortunately, nowadays we can be more sophisticated in our modeling choices and l
This allows us to treat the substitution model as a nuisance parameter and integrate over all _available_ (more on this later) substitution models while simultaneously estimating the phylogeny and other model parameters. Thus, parameter estimates are effectively averaged over different substitution models, weighted by their support.
A useful consequence is that as we are exploring the space of different substitution models we also log the proportion of time that the Markov chain spends in a particular model state. This can be interpreted as the posterior support of a model, which tells us how strongly the data and our prior beliefs support a model in comparison to other competing models.
-Note that **bModelTest** is only able to average over a subset of substitution models that are (a) implemented in BEAST2 and (b) that it knows how to move between. Ideally we would want to integrate over all possible substitution models, but since this is an intractable problem we restrict ourselves to ....
----
@@ -100,7 +99,7 @@ We will continue analyzing the primate mitochondrial data set from the introduct
In this tutorial, we will simplify things by having all four partitions in the alignment evolve under the same Site, Clock and Tree models.
-> In the **Partitions** panel select all four partitions (with **shift+click**) and then click **Link Site Models**, **Link Clock Models** and **Link Trees**. You should rename each model something more informative than noncoding, such as **site**, **clock** and **tree**. Rename models by double-clicking on the drop-down boxes.
+> In the **Partitions** panel select all four partitions (with **shift+click**) and then click **Link Site Models**, **Link Clock Models** and **Link Trees**. You should rename each model something more informative than noncoding, such as **site**, **clock** and **tree**. Rename models by **double-clicking** on the drop-down boxes.
>
The Partition window should now look like [Figure 2](#fig:partitions).
@@ -139,7 +138,7 @@ In the **Clock Model** and **Prior** tabs, we do not need to change any of the d
## Run the analysis in BEAST
-> Open BEAST and choose `primate-mtDNA-bMT.xml` as the BEAST XML File ([Figure 4](#fig:beastRun)). Then click **run**.
+> Open BEAST and choose `primate-mtDNA-bMT.xml` as the BEAST XML File ([Figure 4](#fig:beastRun)). If BEAGLE is installed check the box to use it. Then click **run**.
>
+> **Topic for discussion:** Did the chain ever visit the JC69 model? Why not?
+>
We can also use the output of our analysis to see if a model with (gamma) rate heterogeneity and/or a proportion of invariant sites is supported. If we select **hasGammaRates** in the window on the left and then click **Estimates** we see the proportion of time the chain spent in a model state with rate heterogeneity on (1) versus off (0), and thus the posterior support for a model with rate heterogeneity. Here, the chain seems to remain in a state with rate heterogeneity on, indicating very strong support for heterogeneity ([Figure 8](#fig:hasGammaRates)). We can also select **hasInvariableSites** to see if a model with invariant sites is supported. Here we see that the model spends more time in a model state with invariant sites off (0) than on (1), indicating that the presence of invariant sites are not as strongly supported important ([Figure 9](#fig:hasInvariableSites)). Note that we can also look at the traces for **BMT_gammaShape** and **BMT_ProportionInvariant** to see which values of these two parameters the chain visited.
@@ -225,7 +226,27 @@ There are a few other things we can look at in Tracer as well:
+> **Topic for discussion:** Why does **BMT_Rates.6** not mix poorly?
+>
+> **Hint:** Look at the table of substitution models and the distribution and trace of **BMT_ModelIndicator** in Tracer.
+>
+> **Bonus hint:** Look at the trace for **BMT_Rates.6** and plot the joint-marginal between **BMT_Rates.1** and **BMT_Rates.2**.
+>
+
+
+Select pairs of the **rateAC, ... ,rateGT** parameters (using **shift+click**) and click on the **Joint-Marginal** tab to investigate parameter correlations ([Figure 10](#fig:rateCorrelations)). Try looking at **rateAT** vs. **rateCG** and **rateCG** vs **rateGT**).
+
+
+
+> **Topic for discussion:** It appears that some pairs of the rate parameters are highly correlated for some samples and uncorrelated for the rest. What is happening here? Should we be worried about these parameter correlations?
+>
+> Do you also see correlations between the **BMT_Rates.1 to 6** parameters?
+>
## Analyzing the output using BModelAnalyzer
@@ -233,28 +254,28 @@ There are a few other things we can look at in Tracer as well:
Another really nice feature of bModelTest is that we can graphically analyze the output using the **BModelAnalyser App**.
-> In BEAUti, select **File > Launch Apps** and then launch the **BModelAnalyser App**. A dialogue window should pop up ([Figure 10](#fig:analyzerDialogue)). Enter `primate-mtDNA-bMT.log` as the file to analyze. You can leave the other entries at their default settings but make sure **transitionTransversionSplit** is selected for the Model Set and the box next to **Use Browser For Visualization** is checked. Then click **OK**.
+> In BEAUti, select **File > Launch Apps** and then launch the **BModelAnalyser App**. A dialogue window should pop up ([Figure 11](#fig:analyzerDialogue)). Enter `primate-mtDNA-bMT.log` as the file to analyze. You can leave the other entries at their default settings but make sure **transitionTransversionSplit** is selected for the Model Set and the box next to **Use Browser For Visualization** is checked. Then click **OK**.
>
-After BModelAnalyser runs, a new window should appear in your default web browser that represents the model selection results graphically ([Figure 11](#fig:modelGraph)). This graph depicts the nested relationship of the different substitution models: an arrow pointing from one model to another indicates that the model at the tail is nested within the model at the head of the arrow. As we can see, JC69 is nested within all other models and all other models are nested within GTR.
+After BModelAnalyser runs, a new window should appear in your default web browser that represents the model selection results graphically ([Figure 12](#fig:modelGraph)). This graph depicts the nested relationship of the different substitution models: an arrow pointing from one model to another indicates that the model at the tail is nested within the model at the head of the arrow. As we can see, JC69 is nested within all other models and all other models are nested within GTR.
-The area of the circle surrounding each model is proportional to the the posterior support for that model. The colours represent whether the model is contained within the 95% credible set (blue) or not (red). For the primate data set, the HKY model clearly has the highest posterior support ([Figure 11](#fig:modelGraph)). However, other models such as TN93 and two unnamed models, **121323** and **121123**, also have fairly high posterior support. The six digit model code describes how the different substitution rates are grouped in the order of {% eqinline r_{ac} %}, {% eqinline r_{ag} %}, {% eqinline r_{at} %}, {% eqinline r_{cg} %}, {% eqinline r_{ct} %} and {% eqinline r_{gt} %}. For instance, **121323** is a slight variant of the HKY model with an additional group for the rates {% eqinline r_{ct} %} and {% eqinline r_{gt} %}. The six digit codes for all models are shown in [Figure 6](#fig:modelIndexes).
+The area of the circle surrounding each model is proportional to the the posterior support for that model. The colours represent whether the model is contained within the 95% credible set (blue) or not (red). For the primate data set, the HKY model clearly has the highest posterior support ([Figure 12](#fig:modelGraph)). However, other models such as TN93 and two unnamed models, **121323** and **121123**, also have fairly high posterior support. The six digit model code describes how the different substitution rates are grouped in the order of {% eqinline r_{ac} %}, {% eqinline r_{ag} %}, {% eqinline r_{at} %}, {% eqinline r_{cg} %}, {% eqinline r_{ct} %} and {% eqinline r_{gt} %}. For instance, **121323** is a slight variant of the HKY model with an additional group for the rates {% eqinline r_{ct} %} and {% eqinline r_{gt} %}. The six digit codes for all models are shown in [Figure 6](#fig:modelIndexes).
> **Topic for discussion:** We have used bModelTest to explore a large set of substitution models. But how do we know that any of the substitution models actually fit the observed sequence data well?
diff --git a/figures/beastRun.png b/figures/beastRun.png
index 96b845e..a54fb04 100644
Binary files a/figures/beastRun.png and b/figures/beastRun.png differ
diff --git a/figures/hasGammaRates.png b/figures/hasGammaRates.png
index 82804cc..2d5228c 100644
Binary files a/figures/hasGammaRates.png and b/figures/hasGammaRates.png differ
diff --git a/figures/hasInvariableSites.png b/figures/hasInvariableSites.png
index 6272af4..a2f3053 100644
Binary files a/figures/hasInvariableSites.png and b/figures/hasInvariableSites.png differ
diff --git a/figures/install_bModelTest.png b/figures/install_bModelTest.png
index f4aa408..441c2c6 100644
Binary files a/figures/install_bModelTest.png and b/figures/install_bModelTest.png differ
diff --git a/figures/modelGraph.png b/figures/modelGraph.png
index afbb772..2f25a58 100644
Binary files a/figures/modelGraph.png and b/figures/modelGraph.png differ
diff --git a/figures/modelPosterior.png b/figures/modelPosterior.png
index dc7425b..db73b54 100644
Binary files a/figures/modelPosterior.png and b/figures/modelPosterior.png differ
diff --git a/figures/modelTrace.png b/figures/modelTrace.png
index 6530975..7c485eb 100644
Binary files a/figures/modelTrace.png and b/figures/modelTrace.png differ
diff --git a/figures/partitions.png b/figures/partitions.png
index 15cf14b..a7bca0e 100644
Binary files a/figures/partitions.png and b/figures/partitions.png differ
diff --git a/figures/rateCorrelations.png b/figures/rateCorrelations.png
new file mode 100644
index 0000000..c19bcbd
Binary files /dev/null and b/figures/rateCorrelations.png differ
diff --git a/figures/siteModel.png b/figures/siteModel.png
index e9b66d1..e65f7b2 100644
Binary files a/figures/siteModel.png and b/figures/siteModel.png differ
diff --git a/refs.bib b/refs.bib
index ff8917f..856929d 100644
--- a/refs.bib
+++ b/refs.bib
@@ -40,5 +40,4 @@ @Article{Bouckaert2017
abstract="Reconstructing phylogenies through Bayesian methods has many benefits, which include providing a mathematically sound framework, providing realistic estimates of uncertainty and being able to incorporate different sources of information based on formal principles. Bayesian phylogenetic analyses are popular for interpreting nucleotide sequence data, however for such studies one needs to specify a site model and associated substitution model. Often, the parameters of the site model is of no interest and an ad-hoc or additional likelihood based analysis is used to select a single site model.",
issn="1471-2148",
doi="10.1186/s12862-017-0890-6",
- url="http://dx.doi.org/10.1186/s12862-017-0890-6"
}
\ No newline at end of file