Skip to content

Commit

Permalink
started LMM Genetic Diversity vignette and added data (#182)
Browse files Browse the repository at this point in the history
*  started LMM Genetic Diversity vignette and added data
* I copied the template to the use folder and placed
  the data in the data folder.
*  did all requested changes
*  we revised the vignette LMM-Genetic-Diversity

sessionInfo:
```R
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] git2r_0.14.0

loaded via a namespace (and not attached):
[1] tools_3.2.2
```
  • Loading branch information
panastheod authored and hlapp committed Jul 5, 2016
1 parent 938d06d commit 997a736
Show file tree
Hide file tree
Showing 2 changed files with 290 additions and 0 deletions.
121 changes: 121 additions & 0 deletions data/GenDivEuglossa.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
Locality Locus Habitat AllelicRichness
CancunNaturalArea(ZonaAgricola)_N_Cancun Egc18 N 5.98
CancunNaturalArea(ZonaAgricola)_N_Cancun ann28 N 3.04
CancunNaturalArea(ZonaAgricola)_N_Cancun ann24 N 4.28
CancunNaturalArea(ZonaAgricola)_N_Cancun ann04 N 1.93
CancunNaturalArea(ZonaAgricola)_N_Cancun ann08 N 12.47
ElQuemado_N_CiudadCarmen Egc18 N 6.94
ElQuemado_N_CiudadCarmen ann28 N 2.41
ElQuemado_N_CiudadCarmen ann24 N 3.23
ElQuemado_N_CiudadCarmen ann04 N 3.18
ElQuemado_N_CiudadCarmen ann08 N 10.89
StaCruz_N_Tizimin Egc18 N 6.28
StaCruz_N_Tizimin ann28 N 2.53
StaCruz_N_Tizimin ann24 N 3.02
StaCruz_N_Tizimin ann04 N 2.45
StaCruz_N_Tizimin ann08 N 11.3
Chetumal_N_Chetumal Egc18 N 5.47
Chetumal_N_Chetumal ann28 N 2.9
Chetumal_N_Chetumal ann24 N 3.1
Chetumal_N_Chetumal ann04 N 2.55
Chetumal_N_Chetumal ann08 N 11.78
Tzacala_N_Merida Egc18 N 5.53
Tzacala_N_Merida ann28 N 3.55
Tzacala_N_Merida ann24 N 3.73
Tzacala_N_Merida ann04 N 1.99
Tzacala_N_Merida ann08 N 11.54
Bethania_N_Campeche Egc18 N 5.52
Bethania_N_Campeche ann28 N 2.88
Bethania_N_Campeche ann24 N 2.88
Bethania_N_Campeche ann04 N 2.26
Bethania_N_Campeche ann08 N 10.84
Tabi_N_Oxkutzcab Egc18 N 6.14
Tabi_N_Oxkutzcab ann28 N 3.08
Tabi_N_Oxkutzcab ann24 N 2.84
Tabi_N_Oxkutzcab ann04 N 2.45
Tabi_N_Oxkutzcab ann08 N 11.66
Mujeres_I_Cancun Egc18 I 4.57
Mujeres_I_Cancun ann28 I 2.89
Mujeres_I_Cancun ann24 I 3.52
Mujeres_I_Cancun ann04 I 2.61
Mujeres_I_Cancun ann08 I 9.85
Cozumel_I_PlayaCarmen Egc18 I 4.54
Cozumel_I_PlayaCarmen ann28 I 2.53
Cozumel_I_PlayaCarmen ann24 I 3.04
Cozumel_I_PlayaCarmen ann04 I 1.99
Cozumel_I_PlayaCarmen ann08 I 8.56
CancunInFrontIsland_D_Cancun Egc18 D 4
CancunInFrontIsland_D_Cancun ann28 D 3.9
CancunInFrontIsland_D_Cancun ann24 D 2.95
CancunInFrontIsland_D_Cancun ann04 D 3
CancunInFrontIsland_D_Cancun ann08 D 11.52
Cancun_D_Cancun Egc18 D 5.28
Cancun_D_Cancun ann28 D 2.83
Cancun_D_Cancun ann24 D 3.84
Cancun_D_Cancun ann04 D 2.44
Cancun_D_Cancun ann08 D 11.9
PlayaCarmen_D_PlayaCarmen Egc18 D 5.47
PlayaCarmen_D_PlayaCarmen ann28 D 2
PlayaCarmen_D_PlayaCarmen ann24 D 2.56
PlayaCarmen_D_PlayaCarmen ann04 D 2.5
PlayaCarmen_D_PlayaCarmen ann08 D 8.81
Chiquil�_D_Chiquil Egc18 D 5.97
Chiquil�_D_Chiquil ann28 D 2.81
Chiquil�_D_Chiquil ann24 D 3.95
Chiquil�_D_Chiquil ann04 D 2.17
Chiquil�_D_Chiquil ann08 D 10.97
PlanAyala_D_CiudadCarmen Egc18 D 6.31
PlanAyala_D_CiudadCarmen ann28 D 2.63
PlanAyala_D_CiudadCarmen ann24 D 3.56
PlanAyala_D_CiudadCarmen ann04 D 2.79
PlanAyala_D_CiudadCarmen ann08 D 10.3
Sucopo_D_Tizimin Egc18 D 6.34
Sucopo_D_Tizimin ann28 D 2.44
Sucopo_D_Tizimin ann24 D 2.99
Sucopo_D_Tizimin ann04 D 2.21
Sucopo_D_Tizimin ann08 D 11.58
ChetumalZonaAgricola_D_Chetumal Egc18 D 6.15
ChetumalZonaAgricola_D_Chetumal ann28 D 2.59
ChetumalZonaAgricola_D_Chetumal ann24 D 3.15
ChetumalZonaAgricola_D_Chetumal ann04 D 2.36
ChetumalZonaAgricola_D_Chetumal ann08 D 11.49
Conkal_D_Merida Egc18 D 5.69
Conkal_D_Merida ann28 D 2.35
Conkal_D_Merida ann24 D 3.46
Conkal_D_Merida ann04 D 1.94
Conkal_D_Merida ann08 D 11.94
China_D_Campeche Egc18 D 5.06
China_D_Campeche ann28 D 2.85
China_D_Campeche ann24 D 3.27
China_D_Campeche ann04 D 2.45
China_D_Campeche ann08 D 10.6
Yohtolin_D_Oxkutzcab Egc18 D 6.17
Yohtolin_D_Oxkutzcab ann28 D 1.99
Yohtolin_D_Oxkutzcab ann24 D 3.18
Yohtolin_D_Oxkutzcab ann04 D 1.99
Yohtolin_D_Oxkutzcab ann08 D 12.56
Cancun_C_Cancun Egc18 C 5.71
Cancun_C_Cancun ann28 C 2.11
Cancun_C_Cancun ann24 C 3.22
Cancun_C_Cancun ann04 C 2.03
Cancun_C_Cancun ann08 C 11.2
TiziminCity_C_Tizimin Egc18 C 5.21
TiziminCity_C_Tizimin ann28 C 1.99
TiziminCity_C_Tizimin ann24 C 3.5
TiziminCity_C_Tizimin ann04 C 3.08
TiziminCity_C_Tizimin ann08 C 11.99
ChetumalCity_C_Chetumal Egc18 C 5.09
ChetumalCity_C_Chetumal ann28 C 2.95
ChetumalCity_C_Chetumal ann24 C 3.22
ChetumalCity_C_Chetumal ann04 C 2.5
ChetumalCity_C_Chetumal ann08 C 12.78
CampecheCity_C_Campeche Egc18 C 5.89
CampecheCity_C_Campeche ann28 C 2.28
CampecheCity_C_Campeche ann24 C 3.11
CampecheCity_C_Campeche ann04 C 2.19
CampecheCity_C_Campeche ann08 C 9.82
OxkutzcabCity_C_Oxkutzcab Egc18 C 5.57
OxkutzcabCity_C_Oxkutzcab ann28 C 2.56
OxkutzcabCity_C_Oxkutzcab ann24 C 2.9
OxkutzcabCity_C_Oxkutzcab ann04 C 2.69
OxkutzcabCity_C_Oxkutzcab ann08 C 12.82
169 changes: 169 additions & 0 deletions use/LMM-Genetic-Diversity.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
title: "Linear mixed-effects models to control for the variability of microsatellite loci when comparing genetic diversity"
---

# Introduction

There is general interest in comparing the amount of genetic variation among different populations, usually measured as number of alleles, heterozygosity and/or allelic richness.

Genetic variability of populations of a species is important because it can be thought of an indirect measure of the evolvability of those populations. The greater the genetic diversity of a population the bigger the amount of genetic raw material upon which selection can act. A population deprived of its genetic variability is less likely to respond to environmental changes and more vulnerable to the negative effects of inbreeding. From a theoretical point of you measuring (correctly) genetic diversity of a population and comparing it among different naturally occurring or human driven treatments might be important to test the theoretical predictions on impact of drift on effect of population size of relatively isolated (small) populations and on mechanisms of genetic variance (diversity) maintenance. From the conservation point of view the relative genetic diversity of population differentially affected by human intervention might give insight into how resilient populations of a species are to rapid changes of the habitat in which they live. Particularly interesting are comparisons between mainland and island(s) natural and disturbed habitats, between urban and rural habitats. But it would also be desirable to correlate genetic diversity with the extent of isolation (distance from mainland, distance from next fragment), or with the size of islands or fragments.

Petit et al. (2005) were the first to emphasize the importance of controlling for the fact that microsatellite variation is dependent from the length of the microsatellite markers employed. They proposed the use of an ANCOVA framework when comparing genetic variation of microsatellite markers, with the mean number of repeats (MNR) included as covariate.

With an ANCOVA approach we use mean genetic diversity across population as covariate. If allelic richness is what is used as measurement of genetic diversity, mean allelic richness across population at each locus will be used as covariate. Highly variable markers will have a high mean allelic richness; less variable markers will be characterized by a smaller value of mean allelic richness. This allows to account for the portion of variability in the data due the fact that some microsatellite markers are more variable than other.

Here we suggest the use of linear mixed-effects models (LMMs) to control for the variability of microsatellite loci (random part of the mixed model) when comparing genetic diversity. This approach has less assumptions and is more statistically powerful than ANCOVA (Crawley 2007). In a LMM approach, we account for the portion of variance due to intrinsic differences in marker variability by introducing a level of grouping (modeled as locus), within which the chosen measurement of genetic diversity (allelic richness or heterozygosity) will vary around a group (locus) mean. The difference with an ANCOVA approach is that loci will be characterized, not only by their respective means across populations, but also by their respective variances around their respective means.

# Assumptions

- The explanatory variables are related linearly to the response
- The errors have constant variance
- The errors are independent
- The errors are normally distributed

# Resources

## Data

As an example we'll use the comparison in Allelic Richness among populations of the orchid bee Euglossa dilemma in the Yucatan peninsula (Soro et al. submitted). Specifically we ask whether the magnitute of isolation could affect the genetic diversity of Euglossa dilemma.The 24 sites where populations were sampled were assigned to 4 categories corresponding to different habitats : 1) Natural (N); 2) Disturbed (D); 3) City (C); 4) Island (I).
We provide a data file (GenDivEuglossa.txt), where for each locality (population) under a certain habitat (fixed factor), Allelic Richness rariefied to a minimum sample size of 10 individuals(response variable), has been calculated at each locus (random factor).
The data are stored in the file called [GenDivEuglossa.txt](https://github.com/nescent/popgenInfo/blob/master/data/GenDivEuglossa.txt).

## Packages

Packages required:

Loading the required packages:
lm4: Package for fitting linear and generalized linear mixed-effects models.
MuMIn: Package mainly used for model selection and model averaging based on information criteria. Here we use the r.squaredGLMM function to calculate conditional and marginal coefficient of determination.
multcomp: Package used for simultaneous inference in general parametric models. The glht function can be used for multiple comparisons in linear mixed effects models.

```{r, packages, message=FALSE}
library("lme4")
library("MuMIn")
library("multcomp")
```

# Analysis

## Section 1: Load the data

At first we need to load our dataset. We will import "GenDivEuglossa.txt".

```{r, eval=FALSE}
GenDivEuglossa <- read.delim("../data/GenDivEuglossa.txt", header = TRUE)
summary(GenDivEuglossa)
str(GenDivEuglossa)
```

```{r, echo=FALSE}
# Note: the code chunk above will be shown to the reader, but it will not run.
# This code chunk will run, but will not be shown to the reader.
GenDivEuglossa <- read.delim("../data/GenDivEuglossa.txt", header = TRUE)
summary(GenDivEuglossa)
str(GenDivEuglossa)
```

## Section 2: Data analysis/Checking assumptions

Second, we build our model using the function lmer () from the lme4 package. With this command, we create a model with Allellic Richness as response with habitat and locus as fixed and random effects factors respectively. We modeled "habitat" as fixed because the four levels we chose (Natural, Disturbed, City and Island) correspond to four levels of progressive level of "isolation", whose effect on the genetic diversity of Euglossa dilemma, we are interested in. Locus is our random factor since we are only interested to account for its variation. Locus has bee modeled as random because the five markers we used are a random sample of the theoretically many markers we could have used.

Now lets look at the summary of our model. First we get several measures of model fit, including AIC, BIC, log likelihood and deviance. Then we get an estimate of the variance explained by the random effect (Locus). As you can see it is quite different from zero and thus important for our model. Next we have estimates of the fixed effects, with standard errors.

```{r, eval=FALSE}
modelMANAG<-lmer(AllelicRichness~Habitat+(1|Locus), GenDivEuglossa, REML=FALSE)
summary(modelMANAG)
```

```{r, echo=FALSE}
# Note: the code chunk above will be shown to the reader, but it will not run.
# This code chunk will run, but will not be shown to the reader.
modelMANAG<-lmer(AllelicRichness~Habitat+(1|Locus), GenDivEuglossa, REML=FALSE)
summary(modelMANAG)
```

Now, we check our model assumptions. We check for normality and homogeneity by inspecting the residual plot. This plot depicts fitted values on the x-axis and residuals on the y-axis.

```{r, eval=FALSE}
plot(modelMANAG)
```

```{r, echo=FALSE}
# Note: the code chunk above will be shown to the reader, but it will not run.
# This code chunk will run, but will not be shown to the reader.
plot(modelMANAG)
```

## Section 3: Summary statistics

In order to test the effect of our fixed factor 'Habitat', we run a likelihood ratio test, comparing our model (modelMANAG), which includes the fixed factor "Habitat", with a model (modelNULL) that excludes it.

```{r, eval=FALSE}
modelMANAG<-lmer(AllelicRichness~Habitat+(1|Locus), GenDivEuglossa, REML=FALSE)
modelNULL<-lmer(AllelicRichness~1+(1|Locus), GenDivEuglossa, REML=FALSE)
anova(modelMANAG, modelNULL)
```

```{r, echo=FALSE}
# Note: the code chunk above will be shown to the reader, but it will not run.
# This code chunk will run, but will not be shown to the reader.
modelMANAG<-lmer(AllelicRichness~Habitat+(1|Locus), GenDivEuglossa, REML=FALSE)
modelNULL<-lmer(AllelicRichness~1+(1|Locus), GenDivEuglossa, REML=FALSE)
anova(modelMANAG, modelNULL)
```

As a goodness of fit measure of our model, we computed both the conditional and marginal coefficient of determination. We quantify the variance accounted by "Habitat" alone (marginal R2 (Nakagawa and Schielzeth 2013)) and the variance accounted for by marker variability (conditional R2 (Nakagawa and Schielzeth 2013)).

```{r, eval=FALSE}
r.squaredGLMM(modelMANAG)
```

```{r,echo=FALSE}
# Note: the code chunk above will be shown to the reader, but it will not run.
# This code chunk will run, but will not be shown to the reader.
r.squaredGLMM(modelMANAG)
```

Finally we test for differences in Allelic Richness among different habitat types by Tukey HSD post-hoc comparisons.
From the results we see that E. dilemma is particularly resilient to loss of natural habitat and only on islands did E. dilemma show significantly reduced genetic diversity.

```{r, eval=FALSE}
posthoc<-glht(modelMANAG, linfct = mcp(Habitat = "Tukey"))
summary(posthoc)
```

```{r, echo=FALSE}
posthoc<-glht(modelMANAG, linfct = mcp(Habitat = "Tukey"))
summary(posthoc)
```

# Conclusions

LMM analyses showed that locality type had an effect on allelic richness in E. dilemma. The variance accounted for by locality type alone (marginal R2 of the fixed factor habitat (Nakagawa and Schielzeth 2013) was 0.3 %, while the variance accounted for by marker variability (conditional R2 (Nakagawa and Schielzeth 2013)) was 96.4%. This highlights how inherent inter-locus variability in genetic diversity could obscure any signal of genetic variability related to environmental variables unless inherent inter-locus variability is properly accounted for in analyses.
The intensification of forest clearance on the Yucatan Peninsula and concomitant habitat fragmentation seem to have had little, if any, effect on neutral genetic diversity of E. dilemma, measured as allelic richness. Inhabiting an island, on the contrary, seemed to lower the allelic richness of E. dilemma.

# Contributors

- Panagiotis Theodorou (Author)
- Antonella Soro (Author)

# References

Crawley, M. J. (2007).The R Book. West Sussex: J. Wiley.
Nakagawa, S. & Schielzeth, H. (2013). A general and simple method for obtainin R2 from generalized linear mixed-e???ects models. Methods in Ecology and Evolution, 4,133-142.
Petit, R.J., Deguilloux, M.F., Chat, J., Grivet, D., Garnier-Gere, P.& Vendramin, G.G. (2005). Standardizing for microsatellite length in comparisons of genetic diversity. Molecualr Ecology, 14,885-890.
Soro A., Quezada-Euan J. G., Theodorou P., Moritz R.F.A.,Paxton R.J. (2016). The population genetics of two orchid bees suggests high Ne, high dispersal, low diploid male production and only a effect of island isolation in lowering genetic diversity. (submitted to Conservation Genetics)

# Session Information

This shows us useful information for reproducibility. Of particular importance
are the versions of R and the packages used to create this workflow. It is
considered good practice to record this information with every analysis.

```{r, sessioninfo}
options(width = 100)
devtools::session_info()
```



0 comments on commit 997a736

Please sign in to comment.