**Niche Conservatism**:   Closely related species are found to be ecologically similar.

**Phylogenetic niche conservatism**:   the tendency of lineages to retain heir niche-related traits through speciation events and over macroevolutionary time (e.g. Ackerly, 2003; Cooper et al., 2010; Wiens et al., 2010).

**Niche**:   Broadly, the niche is the set of ecological conditions under which a species survives and individuals reproduce. It is usual to     distinguish between the fundamental niche and realized niche. The     fundamental niche is the set of conditions under which a species can     exist, including physical conditions, vegetation and available resources. However, interactions with other organisms (competitors, pathogens and mutualists), and the capacity of the species to disperse and establish, limit opportunities to occupy the full potential range, resulting in a smaller actual or realized niche (Hutchinson, 1957; Begon et al., 2006; Losos, 2008a; Cooper et al., 2010; Ricklefs, 2010). 

**Blombergs K**:   Blombergs K (2003) tests the fit of estimated character evolution against a Brownian motion model of character change (drift). A value close to 1.0 indicates that character evolution fits a BMmodel, a value > 1.0 indicates closely related lineages are more similar than expected under a BM model, whereas a value < 1.0 indicates overdispersion: closely related lineages are more different from one another than expected under a BM model. 

**Pagels $\lambda$**:   Pagels $\lambda$ is similar to K, with values < 1.0 indicating that traits are less similar among species than expected from their phylogenetic relationships and values > 1.0 showing the reverse (Pagel, 1999). A recent simulation study compares these and other measures of PS, finding that $\lambda$ generally outperforms K, although the latter is suitable for models with changing evolutionary rates (Munkemuller et al., 2012). 

**Alpha diversity**:   $p_{i}=x_{i}/{\displaystyle \sum_{i=1}^{S}x_{i}}$, $D_{a}=\left({\displaystyle \sum_{i}^{S}p_{i}^{a}}\right)^{\frac{1}{1-a}}$, ${\displaystyle \sum_{i}^{S}p_{i}^{a}}$ is a diversity index $H$; $D_{a}$ is numbers equivalent. The numbers equivalent of an index, not the index itself, has the properties biologists expect of a true diversity, the numbers equivalent $D_{a}$ of a diversity index of order $a$ will be called the true diversity of order $a$. All diversity indices of a given order have the same true diversity $D_{a}$. Orders higher than 1 are disproportionately sensitive to the most common species, while orders lower than 1 are disproportionately sensitive to the rare species. The critical point that weighs all species by their frequency, without favoring either common or rare species, occurs when $a=1$. 
    
- $a=0$, species richness
- $\text{lim}(a)\rightarrow1$, $D_{1}=\exp\left(-{\sum_{i}^{S}p_{i}\log p_{i}}\right)$, Shannon-Wiener index: $H^{'}=\log(D_{1})=-{\sum_{i}^{S}p_{i}\log p_{i}}$
- $a=2$, $D_{2}=1/{\sum_{i}^{S}p_{i}^{2}}$. Simpson’s index$=1-{\sum_{i}^{S}p_{i}^{2}}$
- Classical biodiversity measurements (species richness or the myriad of diversity indices such as Shannon) have relied on three main assumptions: (i) all species are equal (only relative abundances establish the relative importance of species), (ii) all individuals are equal (whatever their size) and (iii) species abundances have been correctly assessed with appropriate tools and in similar units (Magurran 2005). But species all not equal in their effects on ecosystem since their functional traits matter to ecosystem processes... 
![image](0_home_dli_Dropbox_Notes_figure_pasted1.png)

**Beta diversity**:   the amount of compositional change represented in a sample (a set of sample units). The term has also recently been applied in a different way (e.g. Condit et al 2002), as a rate of decay in species similarity with increasing distance, without respect to explicit environmental gradients. Three applications of beta diversity in the usual sense are: 

-  Direct gradient: beta diversity is the amount of change in species composition along a directly measured gradient in environment or time. 
-  Indirect gradient: beta diversity is the length of a presumed environmental or temporal gradient as measured by the species. 
-  No specific gradient: beta diversity measures compositional heterogeneity without reference to a specific gradient. 

**Species turnover**:  is a special case of beta diversity applied to changes in species composition along explicit environmental gradients (Vellend 2001).

**Limitation of species diversity**:   1. Species diversity is only one part of diversity. Two sites with same species diversity can have entirely different functional or phylogenetic diversity. 2. Species names are information poor, they cannot tell you info about their functional traits or evolutionary history.

**Classification Tree Models**: Tree classifiers are often called CART (for Classification And Regression Trees), but CART is actually a specific (copyrighted and trade-marked) example of such approaches. R contains two different implementations of tree classifiers, `tree()` in package `tree`, and `rpart()` in package `rpart`. Due to its similarity to GLM and GAM (deviance-based) the material below is based on `tree()`. Tree classifiers are called "tree" classifiers because their result is a dichotomous key that resembles a tree. Because ecologists are generally extremely familiar with dichotomous keys from taxonomy, the result seems quite intuitive. `test=tree(factor(y)~x,control=tree.control(nobs=10,mincut=3))`  surround the dependent variable with factor(). That tells tree() that we want a logistic, or classification, tree rather than a regression tree. Normally, we do not need to use the `control=tree.control()` function inside the call to `tree()`. It was only necessary here because the dataset was so small. Unfortunately, tree-classifiers tend to over-fit their models extremely. Unlike GLM and GAM models where we can use the reduction in deviance compared the the number of degrees of freedom to test for significance, with tree classifiers we need to go to another approach: Cross-validation. In GLM or GAM every variable was used, but in tree classifiers a variable is only used if it is the best available variable, and especially in pruned trees many variables never enter the model. 

In [None]:
x <- 1:10 
y <- c(0,0,0,0,1,0,1,1,1,1)
library(tree) 
test=tree(factor(y)~x,control=tree.control(nobs=10,mincut=3)) #classification mode
#test=tree(y~x)#regression mode for numeric dependent variable
test 
#The "root" node is equivalent to the null deviance model in GLM or GAM; 
#if you made no splits, what would the deviance be? In this case we have 5 
#presences and 5 absences, so -2 * sum(pi*log(pi) + (1-pi)*log(1-pi)) 
#= -2 * 10 * log(0.5) = 13.86.
summary(test)
plot(test);text(test)
#The height of the vertical lines is proportional to the reduction in deviance

**Adaptive radiations**:   Evolutionary lineages that have undergone exceptionally rapid diversification into a variety of lifestyles or ecological niches. Clades that have rapidly diversified by adapting to a wide range of resource zones. 

**Life history theory**:   explores how the schedule and duration of key events in an organism’s lifetime are shaped by natural selection. It helps explain variation in the age at which organisms begin reproducing, the size and number of offspring produced, the amount and type of parental care invested, and even the onset of senescence.

**Supply side ecology**:   in ecological systems comprised of local assemblages linked by spersal, community structure can be more strongly influenced by e supply of new individuals arriving and recruiting to a site .e., dispersing larvae or diaspores) than by post-recruitment otic interactions. (credited to J. Roughgarden by Lewin [86]).

**Functional traits spectra**:   describe how several functional traits, including leaf measures, patterns of whole-plant allocation, and attributes of stem hydraulics, interrelate with each other.

**Meta-analysis**:   A quantitative research synthesis that analyzes the results of a set of analyses (Glass, 1976).

**Vote-Counting Methods**:   the results are placed in one of three categories: statistically significant in the expected direction (positive findings), statistically significant in the unexpected direction (negative findings), not statistically significant. The largest proportion category is asserted to be the statistical trend summarizing the primary literature and is used as evidence to support a given hypothesis. Also, when results fall into one of two catergories, a simple sign test can be used to statistically assess the significance of the overall effect. This method is simple and straightforward, but tends to be overall conservative, have low statistical power.

**Combined Probability Methods**:   Combine statistical results from a set of studies based on exact probability values to provide an overall assessment of significance. They are referred to as omnibus tests, because they only depend on the exact probabilities from each individual study (so the different sample sizes of each study are taken into account), rather than the distribution of the underlying data (Hedges and Olkin, 1985). Minumun of p method (compare the minimum p value of studies with $\alpha=1-(1-\alpha_{e.g.0.05})^{1/n}$); Sum of logs method (compare $p=-2\sum_{1}^{n}\log(p_{_{i}})$ with $x_{df=2n}^{2}$); Sum of Z method; Sum of p method, etc. But these methods can not quantify the magnitude of the effect that is yielding significance, nor can they assess the overall agreement (homogeneity) or lack of it among studies or groups of studies.

**Mordern Meta-Analysis**:   Combine the measures of effects from individual studies into an estimate of the overall strength of the effect, then determining whether this combined effect size is greater than expected by chance. There are two main steps:

1.  **Individual effect sizes** and their associated variances are calculated for each study in order to place the data from the promary studies on a common scale. Effect size for: 1, data representing the means, sample sizes, and the standard deviations for both the experimental and control groups; 2, data presented as 2 by 2 contingency tables from a controlled experiment; 3, data whose summary statistics can be transformed and represented as a correlation coefficient. Hedges’d: $\bar{X},s,N,\;d = \frac{\bar{x}{}^{E} - \bar{x}{}^{C}}{s}J,\:s = \sqrt{\frac{(N^{E}-1)(s^{E})^{2} + (N^{C}-1)(s^{C})^{2}}{N^{E} + N^{C}-2}},\:J = 1-\frac{3}{4(N^{C} + N^{E}-2)-1}\; V_{d} = \frac{N^{C} + N^{E}}{N^{C}N^{E}}+\frac{d^{2}}{2(N^{C}+N^{E})}$
2.  To combine these effect sizes in a statistical summary based on a particular meta-analytic models (weighted statistical models, weights are based on the studies’ sampling variances.). From these weighted models, one obtains estimates of the overall effect present in the data, as well as its variance. This can be used to determine whether there is a significant overall effect. An estimate of heterogeneity among studies can also be calculated, using to determine whether the individual studies likely come from one or more statistical populations. Other alternative methods (e.g., Bayesian methods) for statistically summarizing effect sizes have been suggested also.

**Presence-absence Matrix**:   The presenceabsence matrix is the fundamental unit of analysis in community ecology and biogeography (McCoy and Heck 1987). In such a matrix, rows are species, columns are sites or samples, and entries are the presence (1) or absence (0) of a species in a site.

**Effect size:**:   Over-reliance on statistical significance (p-value) is one of the shortcomings of the Null hypothesis significance testing (NHST) approach, as an effect of any size will ultimately be shown to be statistically significant with a large enough sample size. For that reason, statisticians recommend we report not just the probability value, but also some measure of effect size. For different statistical analysis:

-   t-test: Cohen’s d. For independent-samples t teset, $d=\frac{|\bar{x_{1}}-\bar{x_{2}}|}{s_{p}},s_{p}=\frac{(n_{1}-1)s_{1}^{2}+(n_{2}-1)s_{2}^{2}}{n_{1}+n_{2}-2}$. For one-sample t-test, $d=\frac{|\bar{x}-\mu_{0}|}{s_{x}}$.
 -   ANOVA: In the two-way ANOVA, we can use partial eta squared, symbolized by $\eta_{p}^{2}=SStrt/SStotal$ , as an effect-size index. For repeated-Measures ANOVA, $\eta_{p}^{2}=\frac{SStrt}{SStrt+SSerr}$.
 -   Chi-Square Tests: For chi-squre goodness of fit test, $\hat{\omega}=\sqrt{\sum(\frac{(p_{o}-p_{e})^{2}}{p_{e}})}$, where $p_{o}$ is the observed proportion, and $p_{e}$ is the expected proportion, a value of 0.1 is a small effect size and 0.3 is a medium effect and 0.5 is a large effect; For the chi-square test of association, the phi coefficient ($\textrm{Ø}=\sqrt{\frac{x^{2}}{n}}$) is an effect-size index for two-by-two tables. For contingency tables with three or more columns or rows, the appropriate effect-size index is called Cramér’s V$=\sqrt{\frac{x^{2}}{n(df_{smaller})}}$, (sometimes known as Cramér’s phi) where $x^{2}$ is the calculated value of chi-square, $df_{smaller}$ is the df for the variable with the smaller number of levels. The `psych` package in R has the phi coefficient built into it (`phi()`).

**Jackknifing**:   Sampling without replacement is associated with a technique known as jackknifing, while sampling with replacement is used in a technique known as bootstrapping.

**Null model**:   a pattern-generating model that is based on randomization of ecological data or random sampling from a known or imagined distribution (Gotelli & Graves, 1996). The null model strategy is to construct a model that deliberately excludes a mechanism being tested. We want to know how well the data can be fitted by such a model (Hilborn & Mangel, 1997) in other words, can the patterns in the real data be reproduced in a simple model that does not incorporate biologically important mechanisms? Or, do the data appear non-random with respect to the null hypothesis? If so, the analysis provides some evidence in support of the mechanism (although it never can be taken as a proof of the mechanism in a strict Popperian framework). Although the point has been made elsewhere (Connor & Simberloff, 1986), it bears repeating that the null hypothesis is not that communities are entirely random or have no structure (Roughgarden, 1983). Rather, it is that community structure is random with respect to the mechanism being tested. The null model can include as much structure as is warranted by the data and the biology, as long as the mechanism of interest can be carefully excluded from the randomization. In practice, most null models are fairly simple in their randomization structure, if only because the kinds of data and biological information needed to construct more sophisticated null models (e.g. Graves & Gotelli, 1983) are usually lacking.

**Ordination**:   Techniques are used to order or ordinate multivariate data. Ordination creates new variables (called principle axes) along which samples are scored or ordered. This ordering may represent a useful simplication of patterns in complex multivariate data sets. Used in this way, ordination is a data reduction technique: beginning with a set of $n$ variables, the ordination generates a smaller number of variables that still illustrate the important patterns in the data. It also can be used to discriminate or separate samples along axis. (Gotelli, 2004)

Ordination should be used cautiously for testing hypotheses; it is best used for data exploration and pattern generation. However, scores from any ordination can be used to classical hypothesis testing as long as the assumptions of tests are met.\ Basic ordination uses only community composition: Indirect Gradient Analysis. Constrained ordination studies only the variation that can be explained by the available environmental variables: Often called Direct Gradient Analysis.
 Distinct flavours of tools:

 -   Nonmetric MDS: the most robust method
 -   PCA duly despised
 -   Flavours of Correspondence Analysis popular
 -   Canonical method: Constrained Correspondence Analysis

**Constrained and unconstrained ordination**:   Unconstrained ordination techniques are only based on the species matrix. Constrained ordination techniques use information from both the species and the environmental matrices. The constrained ordination techniques attempt to explain differences in species composition between sites by differences in environmental variables. Constrained ordination studies only the variation that can be explained by the available environmental variables: Often called Direct Gradient Analysis.

**PCA**:   Principal component analysis. The primary use of PCA is to reduce the dimensionality of multivariate data. In other words, we use PCA to create a few key variables (each of which is a composition of many of our original variables) that characterize as fully as possible the variation in a multivariate dataset. The most important attribute of PCA is that the new variables are not correlated with one another, thus can be used in multiregression or ANOVA without fear of multicollinearity. We can treat the principal component scores as a simple univariate response variable and use ANOVA to test for differences among treatment groups.\ In general, if you have measured $j=1\,to\,n$ variables $Y_{j}$ for each replicate, you can generate $n$ new variables $Z_{j}$ that are all uncorrelated with one another. The new variables can be ordered on the amount of variation they explain in the original data. $Z_{1}$ has the largest amount of variation in the data, called the **major axis**. Ordered this way, the $Z_{j}$’s are called **principal components**. $Z_{j}=a_{i1}Y_{1}+a_{i2}Y_{2}+\cdots+a_{in}Y_{n}$ is a linear combination of all of the variable, $a_{ij}$’s are the coefficients for factor $i$ that are multiplied by the measured value for variable $j$ and called **loading**. All $Z_{j}$’s are uncorrelated, independent and orthogonal and called principal components **scores**.  **Calculation**: We stared by standardizing our data using $Z=(Y_{i}-\bar{Y})/s$ then compute the sample variance-covariance matrix $C=\begin{bmatrix}s_{1}^{2} & c_{1,2} & \cdots & c_{1,n}\\ c_{2,1} & s_{2}^{2} & \cdots & c_{2,n}\\ \vdots & \vdots & \ddots & \vdots\\ c_{n,1} & c_{n,2} & \cdots & s_{n}^{2} \end{bmatrix}$ of the standardized data. $C$ is the **same** as the correlation matrix $P=\begin{bmatrix}1 & r_{1,2} & \cdots & r_{1,n}\\ r_{2,1} & 1 & \cdots & r_{2,n}\\ \vdots & \vdots & \ddots & \vdots\\ r_{n,1} & r_{n,2} & \cdots & 1 \end{bmatrix}$ of the raw data. We then calculate the eigenvalues $\lambda_{1},\cdots,\lambda_{n}$ of the sample variance-covariance matrix and their associated eigenvectors $a_{j}$, the $j$th eigenvalue is the variance of $Z_{j},$ and the loadings $a_{ij}$ are the elements of the eigenvectors. $var_{total}=\sum_{j=1}^{n}\lambda_{j}$, $var_{j-explained}=\lambda_{j}/\sum_{j=1}^{n}\lambda_{j}$. $\mathbf{Eigenvalue=Variance}$.

**Factor analysis**:   Factor and PCA have a similar goal: the reduction of many variables to few variables. Whereas PCA creates new variables as linear combinations of the original variables, factor analysis considers each of the original variables to be a linear combination of some underlying factors. You can think of this as PCA in reverse: $Y_{j}=a_{i1}F_{1}+\cdots+a_{in}F_{n}+e_{j}$. Each $a_{ij}$ is a factor loading, the F’s are called common factors. Factor analysis usually begins with a PCA and uses the meaningful components $Z_{j}$ as the initial factors. Because the transformation from Y to Z is orthogonal, there is a set of coefficients $a_{ij}^{*}$ such that $Y_{j}=a_{i1}^{*}Z_{1}+\cdots+a_{in}^{*}Z_{n}$. For a factor analysis, we keep only the first m components, such as those determined to be important in the PCA using a scree plot: $Y_{j}=a_{i1}^{*}Z_{1}+\cdots+a_{im}^{*}Z_{m}+e_{j}$. To transform the $Z_{j}$’s into factors, we divide each of them by its standard deviation $\sqrt{\lambda_{i}}$, the square root of its corresponding eigenvalue. This gives us a factor model: $Y_{j}=b_{i1}F_{1}+\cdots+b_{im}F_{m}+e_{j}$, where $F_{j}=Z_{j}/\sqrt{\lambda_{j}},b_{ij}=a_{ij}^{*}\sqrt{\lambda_{j}}$.

Unlike the $Z_{j}$’s generated by PCA, the factors $F_{j}$ are not unique, we can generate new factors $F_{j}^{*}$ that are linear combinations of the original factors $F_{j}^{*}=d_{i1}F_{1}+\cdots+d_{im}F_{m}$. We identify the values for the coefficients that give us the factors that are the easiest to interpret. This process is called **rotating** the factors.

**Rotating**:   There are two kinds of factor rotations: **orthogonal rotation** results in new factors that are uncorrelated with one another, whereas **oblique rotation** results in new factors that may be correlated with one another. The most common type of orthogonal factor rotation is **varimax rotation**, which maximizes the sum of the variance of the factor coefficients $var_{max}=maximum\:of\:var(\sum_{j=1}^{n}d_{ij}^{2})$.

**Principal Coordinates analysis**:   is a method used to ordinate using any measure of distance. PCA and factor analysis are used when we analyze quantitative multivariate data and wish to preserve Euclidean distances between observations. PCA is a special case of PCoA with Euclidean distances. Procedures:\ 1. Generate a distance or dissimilarity matrix from the data. Can use any distance measurements. $d_{ij}$. $D$ is a square matrix, number of observations.\ 2. Transform $D$ into a new matrix $D^{*}$ with elements $d_{ij}^{*}=-\frac{1}{2}d_{ij}^{2}$. This converts the distance matrix into a coordinate matrix that preserves the distance relationship between the transformed variables and the original data.\ 3. Center the matrix $D^{*}$to create matrix $\Delta$ with elemtnts $\delta_{ij}=d_{ij}^{*}-\bar{d_{i.}^{*}}-\bar{d_{.j}^{*}}+\bar{d_{..}^{*}}$\ 4. Compute the eigenvalues and eigenvectors of $\Delta$. The eigenvectors $v_{k}$ must be scaled: $\sqrt{v_{k}^{T}v_{k}}=\sqrt{\lambda_{k}}$.\ 5. Write the eigenvectors $v_{k}$ as columns, with each row corresponding to an observation. The entries are the new coordinates of the objects in principal coordinates space.

**Correspondence analysis**:   also know as reciprocal averageing (RA) or indirect gradient analysis, is used to examine the relationship of species assemblages to site characteristics. The sites usually are selected to span an environmental gradient and the underlying hypothesis or model is that species abundance distributions are **unimodal** and approximately normal across the environmental gradient. *CA is a special case of PCoA with chi-square distance matrix*.\ However, like other ordination methods, CA has the unfortunate mathematical property of compressiong the ends of an environmental gradient and accentuating the middle, this can result in ordination plots that are curved into an **arch or a horseshoe shape**. even when the samples are evenly spaced along the environmental gradient. A reliable and interpretable ordination technique should preserve the distance relationships between points. For many common distance measures, the horseshoe effect distorts the distance relationships among the new variables created by the ordination.

**Detrended correspondence analysis**:   is used to remove the horseshoe effect of CA. However it is no longer recommended since horseshoe is a mathematical consequence of applying most distance measures to species have unimodal responses to underlying environmental gradients. The use of alternative distance measures is a better solution to the horseshoe problem.

**Non-Metric Multidimensional scaling**:   PCA, PCoA, CA, factor analysis methods are similar in that the distances between observations in multivariate space are preserved **to the extent** possible after the multivariate data have been reduced to a smaller number of composition variables. In contrast, the goal of NMDS is to end up with a plot in which different objects are placed far apart in the ordination space while similar objects are placed close together. **Only the rank ordering** of the original distances or dissimilarities is preserved.

 1.  Generate a distance or dissimilarity matrix from the data. Can
use any distance measures. $d_{ij}$. $D$ is a square distance
matrix between observations.
 2.  Choose the number of dimensions (axes) $n$ to be used to draw
the ordination, usually 2 or 3.
 3.  Start the ordination by placing the $m$ observations on the
$n$-dimensional space. Subsequent analyses depend strongly on
this initialization, because NMDS finds its solution by local
minimization (similar to non-linear regression). If some
grographic information is available that may be used as a good
starting point. Alternatively the output from another ordination
such as PCoA can be used to determine the initial positions of
observations in an NMDS.
 4.  Compute new distance $\delta_{ij}$ between observations in the
initial configuration. Normally Euclidean distances are used.
 5.  Regression $\delta_{ij}$ on $d_{ij}$, get a set of predicted
values $\hat{\delta}_{ij}$. For example, if use linear
regression
$\delta_{ij}=\beta_{0}+\beta_{1}d_{ij}+\varepsilon_{ij}$, then
$\hat{\delta}_{ij}=\hat{\beta}_{0}+\hat{\beta}_{1}d_{ij}$. Usual
choices of regressions are the linear, polynomial, or monotone
regression (nonparametric). Monotone regression is a
step-function which is constrained to always increase from left
to right, which is a common choice in NMDS.
 6.  Compute a goodness of fit **(stress)** between $\delta_{ij}$ and
$\hat{\delta}_{ij}$. Stress is computed on the lower triangle of
matrix $D$. In essence, stress represents the extent to which
the rank order of the fitted distances disagrees with the rank
order of the observed dissimilarities.
$$stress=\sqrt{\frac{\sum_{i=1}^{m}\sum_{j=1}^{n}(\delta_{ij}-\hat{\delta}_{ij})^{2}}{\sum_{i=1}^{m}\sum_{j=1}^{n}\hat{\delta}_{ij}^{2}}}$$
 7.  Change the position of the $m$ observations in $n$-dimensional
space slightly in order to reduce the stress.
 8.  Repeat steps 4-7 until the stress can no longer be reduced any
further.
 9.  Plot the position of the $m$ observations in $n$-dimensional
space for which stress in minimal. This plot illustrates
“relatedness” among observations. Most NMDS programs rotate the
final solution using PCA for easier interpretation.

**MANOVA**:   The validity of classical multivariate analysis of variance (MANOVA) relies on certain assumptions, including the independence of the sample units (e.g., row vectors), the multivariate normality of errors, and the homogeneity of variancecovariance matrices among the groups. The classical MANOVA test statistics (i.e., Wilks lambda, the Hotelling-Lawley trace, Pillais trace, and Roys largest root criterion) are designed specifically to test the null hypothesis ($H_{0}$) of no differences in the multivariate centroids (the central location, or vector of mean parameters for all variables) among the groups. These tests also require the total number of sample units ($N$, say) to be large relative to the number of variables ($p$, say), and cannot be calculated when $p>N$. In many biological, ecological, and environmental data sets, the assumptions of MANOVA are not likely to be met (e.g., Clarke 1993, McArdle and Anderson 2001). A number of more robust methods to compare groups of multivariate sample units have been proposed and several of these have now become very widely used in ecology. They include the analysis of similarities (ANOSIM; Clarke 1993), permutational multivariate analysis of variance (PERMANOVA; Anderson 2001; see also Pillar and Orloci 1996, Gower and Krzanowski 1999, Legendre and Anderson 1999, McArdle and Anderson 2001), and the Mantel test (Mantel 1967; see also Mantel and Valand 1970).

These methods all construct ANOVA-like test statistics from a matrix of resemblances (distances, dissimilarities, or similarities) calculated among the sample units, and obtain $P$-values using random permutations of observations among the groups, thereby **assuming** only exchangeability for the one-way case. Any resemblance measure may be chosen as the basis of the analysis (optionally after first transforming the data, e.g., Clarke and Green 1988, Clarke 1993) to reflect whatever qualities among the samples may be of greatest interest (e.g., Legendre and Legendre 1998, Clarke et al. 2006, Anderson et al. 2011).

The test statistics inherent in resemblance-based permutation tests were modeled to varying degrees on FishersFstatistic used in univariate ANOVA (Snedecor 1934), specifically by contrasting some function of the between-group vs. the within-group resemblances (e.g., Mantel), their squares (e.g., PERMANOVA), or their ranks (e.g., ANOSIM). They are therefore generally used and interpreted by practitioners for detection of differences in the locations (centroids) of multivariate groups. What is not widely appreciated, however, is that they are actually testing different null hypotheses.

**For balanced designs, PERMANOVA was quite robust to heterogeneity, but ANOSIM and the Mantel test were not**. **ANOSIM and the Mantel test examine the more general $H_{0}$: \`\`*samples in the same group are no more tightly clustered together than samples from different groups*,” whereas PERMANOVA focuses on the more specific $H_{0}$: \`\`*there are no differences in centroids among the groups*.” \`\`*clumping of samples within groups*” or the** ***\`\`differences in centroids*** **(a shift in the location of the multivariate data cloud)”.**

As ANOSIM and the Mantel test are more general “omnibus” tests, rejection of the null hypothesis in either case will indicate only that some feature of the groups differ to make them distinct. This feature could be (1) locations, (2) dispersions, (3) the particular shape (correlation structure) of the data clouds being compared; or indeed, some combination of these things. Although reducedspace ordinations (such as nonmetric MDS) can assist in interpreting the potential nature of any differences detected, it is not possible with these tests or any associated plots to make more specific statistical inferences.

Although the generality of these more omnibus tests can often be useful, in many ecological studies it may be quite important, however, to hone inferences further. For example, ecologists may want to distinguish*has there been a fundamental shift in the community structure itself (a change in location)*? Or rather, *has the community structure become more (or less) variable (a change in dispersion)*? Or both? For balanced designs, PERMANOVA can be used effectively to make inferences about differences in centroids alone (i.e., shifts in the location of the multivariate cloud of sample units in the space of the resemblance measure), while PERMDISP can be used to make inferences about differences in multivariate dispersions alone.

**Dissimilarity measure**:   Some of them:

 -   Jaccard $d_{J}=(b+c)/(a+b+c)$
 -   Sorensen $d_{S}=(b+c)/(2a+b+c)$
 -   Bray-Curtis $d_{BC}=\frac{\sum_{i=1}^{S}\mid x_{1i}-x_{2i}\mid}{\sum_{i=1}^{S}(x_{1i}+x_{2i})}$, $x_{1i}$ is the abundance of species $i$ at sample 1. $0\leq d_{BC}\leq1$. When used with presence/absence, $d_{BC}=d_{S}$.
 -   Gower $d_{G}=\frac{\sum_{i=1}^{S}w_{i}\mid x_{1i}-x_{2i}\mid/R_{i}}{\sum_{i=1}^{S}w_{i}}$, while $R_{i}$ is the range of the $i$th species and $w_{i}$ is an optional weight given to each species (default$=1$). When used with presence/absence, $d_{G}=d_{J}$.

**PERMANOVA**:   The null hypothesis tested by PERMANOVA is that, under the assumption of exchangeability of the sample units among the groups, $H_{0}$: “the centroids of the groups, as defined in the space of the chosen resemblance measure, are equivalent for all groups.” Thus, if $H_{0}$ were true, any observed differences among the centroids in a given set of data will be similar in size to what would be obtained under random allocation of individual sample units to the groups (i.e., under permutation).

**Mantel test**:   Although the Mantel test is usually used to compare two distance matrices, it can be used to compare groups of samples by coding a contrast of between- vs. within-group distances in a model matrix. The null hypothesis for the Mantel test, as originally described, is: $H_{0}$: “there is no relationship between the inter-point distances in one distance matrix and the inter-point distances in a second distance matrix.” However, when the second distance matrix contains codes that contrast between- vs. withingroup distances, then the null hypothesis (once again, under the assumption of exchangeability) becomes $H_{0}$: “the average of the withingroup distances is greater than or equal to the average of the between-group distances.” The alternative hypothesis being that the within-group distances are smaller, on average, than the between-group distances.

**ANOSIM**:   The null hypothesis for the ANOSIM test is closely related to this, namely $H_{0}$: “the average of the ranksof within-group distances is greater than or equal to the average of theranksof between-group distances,” where a single ranking has been done across all inter-point distances in the distance matrix and the smallest distance (highest similarity) has a rank value of 1. For both the ANOSIM test and the Mantel test (in this form), the essence of what is being tested is the degree to which there is greater clumping (smaller distances) among samples within the same group compared to that observed among samples in different groups. The null hypotheses for ANOSIM or the Mantel test are therefore more general (less specific) than the null hypothesis tested by PERMANOVA. Thus, a significant result using ANOSIM or the Mantel test could indicate that the groups differ in their location, their dispersion, or some other distributional quality, such as their degree of skewness, non-sphericity (correlation structure), or some combination of these things, any of which can make the distribution of samples within a given group distinguishable from the rest.

**PERMDISP**:   a resemblance-based permutation test focused strictly on the null hypothesis of homogeneity of multivariate dispersions (Anderson 2006).

**Storage effect**:   Chesson (2000) reviews two mechanisms by which temporal environmental fluctuations may promote coexistence, the temporal storage effect and relative nonlinearity. Under the **storage effect**, species may have similar responses to limiting resources, but each species is favoured at a different period of time. For example, annual plant species may differ in their germination cues, with some species germinating following warm rains, and other species germinating following cold rains (Levine et al. 2008). The storage effect receives its name because, to survive unfavourable periods, species must store the benefits of favourable periods in dormant seeds, diapause or long-lived adult stages. Under **relative nonlinearity**, species are active at exactly the same times, but the reaction norms relating population growth rates to a limiting (competitive) factor take different shapes for different species. For example, population growth of a conservative life-history species might increase rapidly at low resource availability before reaching an asymptote at a relatively low maximum growth rate. In contrast, a species with a more resource-acquisitive life history may only respond weakly to increases in resources at low ambient levels, but may achieve a higher population growth rate at high resource availability. If resource availability fluctuates, due to endogenous or exogenous forcing, the species with the conservative strategy will be favoured at times of low resource availability, while the more acquisitive species will be favoured at times of high resource availability. The mechanism stabilises coexistence because each species influences the pattern of resource availability in a way that favours its competitor. When the conservative species is most abundant and the acquisitive species is rare, resource pulses are poorly utilised; when the acquisitive species is most abundant and the conservative species is rare, resource pulses are quickly drawn down to a level where the conservative species has an advantage.

**Rao index**:   if $d_{ij}$ is the dissimilarity between each pair of species $i$ and $j$, $p_{i}$ is the proportion of the $i$th species in a community and $s$ is the number of species in the community, $\alpha FD=\sum_{i=1}^{s}\sum_{j=1}^{s}d_{ij}p_{i}p_{j}$. There are several possible ways to calculate $d_{ij}$ depending on the type of data and traits available. The way in which dissimilarity is calculated is one of the most important methodological decisions affecting the behaviour of FD. $\gamma FD=\sum_{i=1}^{S}\sum_{j=1}^{S}d_{ij}P_{i}P_{j}$ where $P_{i}=\sum_{c=1}^{n}\frac{p_{ic}}{n}$ (unweighted), $c$ is the number of sampling points in the region and $S$ is the number of species in the region. $\beta FD=\gamma-\bar{\alpha}$. For taxonomic diversity, $d_{ij}=1$ for every $i\neq j$ and $d_{ij}=0$ otherwise (i.e. a unity matrix with null diagonal), $\alpha$Rao equals the Simpson diversity index (Pavoine et al. 2004; BottaDukat 2005; Ricotta 2005a). Rao index can produce systematic lower-than-expected $\beta$-diversity values. So need a correction based on Jost (2007). R code is available at http://www.butbn.cas.cz/francesco/Webpage/R_Functions.html

**Mason index**:   This intuitive approach considers the $\alpha FD$ as the variance of species traits within a community (i.e. their deviation from the community mean). $\alpha FD=\sum_{i=1}^{s}p_{i}(x_{i}-\bar{x})^{2}$, $x_{i}$ is the mean trait value of the $i$th species and $\bar{x}$ is the community mean $\bar{x}=\sum_{i=1}^{s}p_{i}x_{i}$. $\beta FD=\sum_{c=1}^{n}\frac{1}{n}(\bar{x_{c}}-\bar{x}_{region})^{2}$, where $n$ is the number of communities in the region, $x_{c}$ is the average of the $c$th community and $x_{region}$ is the overall mean across all communities in a region. A disadvantage of using Mason’s FD approach is that the trait diversity is calculated only using the species trait mean values. Consequently, the within-species variability for a trait is not accounted for.

**Functional diversity indices**:   Three categories: functional richness, functional evenness and functional divergence. Functional richness represents the amount of functional space occupied by a species assemblage. Functional evenness corresponds to how regularly species abundances are distributed in the functional space. Finally, functional divergence defines how far high species abundances are from the centre of the functional space.

**Nestness**:   Nestedness is a pattern characterized by several features, including a skewed distribution of the number of interacting partners per species, with many specialist species and few extremely generalist species. Nestedness also implies asymmetric specialization, such that specialist species tend to interact with generalist ones. Finally, the generalist species in the nested network form a single, highly connected core, making the networks very cohesive. Three main hypotheses have been proposed to explain the biology behind this seemingly highly organized structure. One is that nestedness is “neutral”, meaning that all interactions between individuals are equally likely. However, the empirical correlation between species abundances and species generalism is not easy to interpret . Do species become generalists because they are more abundant, or are they more abundant because they are generalists and therefore can access more resources? The second hypothesis suggests that nestedness affects ecological dynamics, particularly species coexistence and community stability. A simple argument supporting this hypothesis is that it is much safer for specialist species to interact with generalist species than with other specialists, because generalist species are expected to have less-fluctuating population dynamics and so to be more reliable partners. According to the third hypothesis, nested architecture may be shaped by the (co-)evolutionary dynamics of species interacting within a community.

**Emergent neutrality**:   addresses the questions whether and how it is possible for interspecific interactions to drive a community towards a state in which relative competitive abilities differ so little that demographic stochasticity and potential immigration from outside sources will dominate community patterns instead of the details of the species interactions themselves. The theory of self-organized similarity (SOS) predicts the self-organized, competition-driven emergence of regularly distributed groups of similar species in niche space (Scheffer and van Nes 2006). The model suggests that in order to coexist competing species must be either similar in their traits (i.e. belong to the same emerging group) or radically different (i.e. belong to distinct groups). Species occupying intermediate niche space positions in between groups are rapidly driven to extinction by competitive exclusion.

**Stabilizing niche differences**:   are those differences that cause species to more strongly limit themselves than others through, for example, resource partitioning, host-specific natural enemies, or storage effects. When these stabilizing niche differences are greater than relative fitness differences, they foster diversity during community assembly by preventing competitive exclusion of inferior competitors by superior competitors. Negative frequency-dependent population growth rates are the hallmark of stabilizing niche differences and can thus be used to assess whether community composition during community assembly is stabilized by niche differences.

**Relative fitness differences**:   are those differences between species that predict the outcome of competition in the absence of stabilizing niche differences. They have also been called fitness inequalities (Chesson 2000, Adler et al. 2007). Note that fitness is used in an ecological, not evolutionary, contextspecies are the unit of comparison for fitness differences in coexistence theory, not individuals (as in evolutionary studies). As with stabilizing niche differences, these fitness differences can arise through many mechanisms, including environmentally mediated differences in fecundity or differences in the ability to take up limiting resources and/or tolerate herbivores. Practically, relative fitness differences are difficult to disentangle from stabilizing niche differences.

**Disadvantages of trait/phylo based study**:   Contemporary coexistence theory demonstrates that large relative fitness differences and competitive exclusion can lead to trait clustering (Mayfield & Levine 2010), calling into question the assumption that trait (or phylogenetic) clustering is solely the outcome of environmental filters. The interpretation of phylogenetic overdispersion as reflecting limiting similarity is also complicated because this pattern may reflect a lack of stabilizing niche differences between closely related species (if traits are conserved) or environmental filtering of species with similar traits (if traits are convergent; Cavender-Bares et al. 2004b, 2009). Finally, phylogenetic or trait distribution patterns that do not deviate from null expectations are difficult to interpret as these could reflect a combination or cancelling out of environmental filters, relative fitness differences, or stabilizing niche differences (Mayfield et al. 2005). Despite these disadvantages, advanced statistical techniques are continually improving our ability to infer processes from patterns (Pillar & Duarte 2010, Chase & Myers 2011, Ives & Helmus 2011, Pavoine et al. 2011). from (HilleRisLambers et al. 2012, Annu rev).

**Generalized Linear Models**:   Specifically techniques known as "logistic regression" and "poisson regression", to modeling species - Environment relationships. But, it is limited to linear functions. When you examine the predicted values from GLMs, they are sigmoidal or modal curves, leading to the impression that they are not really linear. This is an artifact of the transformations employed, however, and the models are linear in the logit (log of the odds) or log scale. This is simultaneously a benefit (the models are parametric, with a wealth of theory applicable to their analysis and interpretation), and a hindrance (they require a priori estimation of the curve shape, and have difficulty in fitting data that doesn’t follow a simple parametric curve shape). `R code: glm(y~x, family=binomial)`

**Generalized Additive Models**:   GAMs are designed to capitalize on the strengths of GLMs (ability to fit logistic and poisson regressions) without requiring the problematic steps of a priori estimation of response curve shape or a specific parametric response function. The idea is simple; let the data speak, and draw a simple smooth curve through the data, using a class of equations called "smoothers" or "scatterplot smoothers" that attempt to generalize data into smooth curves by local fitting to subsections of the data. The idea behind GAMs is to "plot" (conceptually, not literally) the value of the dependent variable along a single independent variable, and then to calculate a smooth curve that goes through the data as well as possible, while being parsimonious. The trick is in the parsimony. It would be possible using a polynomial of high enough order to get a curve that went through every point. It is likely, however, that the curve would "wiggle" excessively, and not represent a parsimonious fit.

As a practical matter, we can view GAMs as non-parametric curve fitters that attempt to achieve an optimal compromise between goodness-of-fit and parsimony of the final curve. Similar to GLMs, on species data they operate on deviance, rather than variance, and attempt to achieve the minimal residual deviance on the fewest degrees of freedom. One of the interesting aspects of GAMs is that they can only approximate the appropriate number of degrees of freedom, and that the number of degrees of freedom is often not an integer, but rather a real number with some fractional component. This seems very odd at first, but is actually a fairly straight forward extension of the concepts you are already familiar with. A second order polynomial (or quadratic equation) in a GLM uses two degrees of freedom (plus one for the intercept). A curve that is slightly less regular than a quadratic might require two and a half degrees of freedom (plus one for the intercept), but might fit the data better.

The other aspect of GAMS that is different is that they don’t handle interaction well. Rather than fit multiple variables simultaneously, the algorithm fits a smooth curve to each variable and then combines the results additively, thus giving rise to the name "*Generalized Additive Models*." In practice, interaction terms can be significant, but often require fairly large amounts of data. Like GLMs, they are suitable for fitting logistic and poisson regressions, which is of course very useful in ecological analysis.

GAMs are extremely flexible models for fitting smooth curves to data. On ecological data they often achieve results superior to GLMs, at least in terms of goodness-of-fit. On the other hand, because they lack a parametric equation for their results, they are somewhat hard to evaluate except graphically; you can’t provide an equation of the result in print. Because of this, some ecologists find GLMs more parsimonious, and prefer the results of GLMs to GAMs even at the cost of a lower goodness-of-fit, due to increased understanding of the results. Many ecologists fit GAMs as a means of determining the correct curve shape for GLMs, deciding whether to fit low or high order polynomials as suggested by the GAM plot. R code: `gam(y~s(x), family=binomial) # s means smooth, library(mgcv)`

**Linear mixed models**: statistical models that assume normally distributed errors and also include both fixed and random effects, such as ANOVA incorporating a random effect.

**Generalized linear mixed model**:   Generalized linear mixed models (GLMMs) combine the properties of two statistical frameworks: linear mixed models (which incorporate random effects) and generalized linear models (which handle nonnormal data by using link functions and **exponential family** [normal, Poisson, binomial, exponential and gamma distributions] GLMMs are the best tool for analyzing nonnormal data that involve random effects: all one has to do, in principle, is specify a distribution, link function and structure of the random effects.  
The first important feature of GLMMs is the ability to write sub-models for model coefficients, leading to the various names: hierarchical models, multi-level models or mixed (effect) models (Gelman & Hill 2007). Sub-models may simply be random effects, e.g. plots vary around site means or species vary in their average abundances. But sub-models can also have covariates, e.g. rock cover in plots affects the abundance, or species average abundance may be a function of their life form.

**Link function:**   a continuous function that defines the response of variables to predictors in a generalized linear model, such as logit and probit links. Applying the link function makes the expected value of the response linear and the expected variances homogeneous.

**Random effects:**   factors whose levels are sampled from a larger population, or whose interest lies in the variation among them rather than the specific effects of each level.The parameters of randomeffects are the standard deviations of variation at a particular level (e.g. among experimental blocks). The precise definitions of fixed and random are controversial; the status of particular variables depends on experimental design and context.

**Markov chain Monte Carlo**:  a Bayesian statistical technique that samples parameters according to a stochastic algorithm that converges on the posterior probability distribution of the parameters, combining information from the likelihood and the posterior distributions.

**MacArthur’s paradox**:   At one extreme, MacArthurs work on limiting similarity, competition and coexistence viewed species trait environment relations and interspecific trade-offs as paramount for species coexistence and diversity at local spatial scales. At the other extreme, MacArthurs work on the theory of island biogeography ignored any differences among species in their traits (i.e. species were neutral) when predicting species diversity on islands that vary in their size and isolation from the mainland as a function of colonization and extinction rates.

**Species relative abundance**:   Fisher’s logseries distribution; lognormal distribution by Preston 1948. It turns out the lognormal distribution is much more widespread.

- **Fisher’s logseries** :   It is negative binomial distribution, but Fisher truncated the negative binomial to eliminate the zero class. The number of species in a collection having $n$ individuals will be given by $\alpha x^{n}/n$, where $x$ is a positive constant between 0 and 1 and $\alpha$ is a measure of diversity, which in the expectation is equal to the number of singleton species divided by $x$. The total number of species $S$ is expected to be $\alpha[-\ln(1-x)]$ and the total number of individuals $N=\alpha x/(1-x)$. The parameter $\alpha$, known as Fisher’s $\alpha$, is a widely used measure of species diversity because it is theoretically independent of sample size.
- **Lognormal** :   Over the past half century the lognormal distribution has been fit successfully to a far larger number of relative species abundance distribution. As sample size increases, the lognormal distribution would be revealed. Move a “veil line” from the right hand of a lognormal shape to left. But theoretically unexplained. For a time after the aparaent failure of deductive approches, the general opinion was that the lognormal was of little or no biological interest. It was repeatly pointed out that lognormals could arise simply as the result of the multiplicative interaction of many normal random processes affecting the growth of population or could arise by combining unrelated samples. However, interest in the lognormal was rekindled again when Sugihara (1980) argued that the lognormals that describe relative abundance were not just any lognormals, but were a special class of so-called canonical lognormals.
- **Whittaker’s dominance-diversity plot** :   a graph of the logarithm of the abundance of a species on the y-axis against the rank in abundance of the species on the x-axis. Common species are assigned low ranks and appear to the left. On such a plot, the logseries appears linear, whereas the lognormal is curvilinear, S shape.
- **MacArthur’s broken-stick hypothesis** :   Suppose a community of $S$ species randomly divides up the common resource. To model this, randomly partition the resource pool by throwing $S-1$ random points onto the unit stick. Then break the stick at each random point and rank the fragments from shortest to longest. The expected relative abundance of the $i$th rarest (shortest) species $y_{i}=(\frac{1}{S})\sum_{x=i}^{S}(\frac{1}{x})$.

**Dimensions of functional traits**:   Plants are multifaceted organisms that have evolved numerous solutions to the problem of establishing, growing, and reproducing with limited resources. The intrinsic dimensionality of plant traits is the minimum number of independent axes of variation that adequately describes the functional variation among plants, and is therefore a fundamental quantity in comparative plant ecology. Laughlin (2014) showed that the dimensionality of plant traits does not exceed six in the most comprehensive dataset. Recent analyses indicate that the ability to predict community composition increases rapidly with additional traits, but reaches a plateau after four to eight traits. To optimize research efficiency for advancing our understanding of trait-based community assembly, ecologists should minimise the number of traits while maximising the number of dimensions, because including multiple correlated traits does not yield dividends and including more than eight traits leads to diminishing returns.

**Seed size and number**:   Recent researches suggest that these reflect trade-off between stress tolerence and fecundity, not between competition and colonization (Lönnberg and Eriksson, 2013; Muller-Landau, 2010).

**Resource**:   According to David Tilman: any substance or factor that is both consumed by an organism and supports increased population growth rates as its availability in the environment increases. Three things are key to this definition. First, a resource is consumed, and its amount or availability is thereby reduced. Second, a consumer uses a resource for its own maintenance and growth. Thus, food is always a resource. Third, when resource avialability is reduced, biological processes are affected in such a way as to reduce consumer population growth.

**Competition**:   The capture of essential resources from a common, finite pool by neighbouring individuals (Grime 1979; Trinderet al. 2012). This definition includes both the direct use of common resources by neighbours individuals and also the indirect effect of one plant on the availability of a resource to its neighbour. This automatically defines biomass or seed production by neighbouring individuals (the most usual indirect estimates of competition) as **outcomes** of competition (but, importantly, of other processes too). Perhaps less obviously, our definition means that the production, functioning and maintenance of leaves, stems and roots (and their almost ubiquitous microbial symbionts) are considered **mechanisms** by which the process of competition can operate. Measurements of neither competitive outcomes (seed production, for example) nor competitive mechanisms (root and leaf growth; release of allelochemicals) are direct measurements of the competitive process itself (resource capture). However, while we can readily measure some of the mechanisms by which plants compete (e.g. superior root growth or leaf expansion), it is notoriously difficult to measure the process of competition (resource capture) directly and reliably.

**Asymmetric competition**:   The relationship between competitors is asymmetrical in the sense that each has an advantage with respect to different factors in the environment. For example, one species might exploit resources more efficiently, while the other is better at tolerating stressful conditions or avoiding consumers. Thus promote coexistence.

**Interference competition**:   Competitors interact directly by aggressively defending resources. E.g. Hummingbirds chase hunmmingbirds to protect flower resources.

**Concentration reduction theory**:   The essence of the concentration reduction theory is that it is the average concentration of a limiting nutrient in soil solution that determines growth of individuals and therefore the reduction of this concentration is the mechanism of competition for resources. The concentration reduction theory was originally developed with respect to the dynamics of competition among phytoplankton for nutrients (Titman, 1976; Tilman, 1977). The theory states that when populations of two plant species are competing for a common limiting nutrient, the species that can reduce the average concentration of nutrient in solution to the lowest level and persist at that level will be competitively superior for that nutrient (Tilman, 1980). In equations of nutrient supply, nutrient uptake, and plant growth, the concentration of nutrient in solution is known as $R$ and the lowest concentration at which a population can persist is known as $R^{*}$.

**Supply pre-emption theory**:   Concentration reduction may not be an appropriate theory for competition for nutrients by terrestrial plants because it does not take into account the diffusion of nutrients in soils (Huston & DeAngelis, 1994). In terrestrial systems, diffusion of nutrients in soil solution often limits uptake (Chapin, 1980; Raynaud & Leadley, 2004). Craine et al (2005) suggested that plants outcompete other plants for nutrients by pre-empting the supply from coming in contact with other species, not reducing the average concentration. Therefore, it is possible that competition has selected for species that maintain higher root length densities than would be optimal in the absence of competition.