# Ordination

Ordination and the analysis of ecological gradients

[AJ Smit](https://ajsmit.netlify.app)
[](https://orcid.org/0000-0002-3799-6126)
([University of the Western Cape](https://uwc.ac.za))  
2021-01-01

> ****Material required for this chapter****
>
> | Type        | Name                                | Link                                                                |
> |----------------|-------------------|-------------------------------------|
> | **Slides**  | Ordination lecture slides           | [💾 `BCB743_07_ordination.pdf`](../slides/BCB743_07_ordination.pdf) |
> | **Reading** | Vegan–An Introduction to Ordination | [💾 `Oksanen_intro-vegan.pdf`](../docs/Oksanen_intro-vegan.pdf)     |

The following methods are covered in the lecture slides. You are
expected to be familiar with how to select the appropriate method, and
how to execute each. Supplement your studying by accessing these
sources: Numerical Ecology with R, GUSTA ME (see links immediately
below), and Analysis of Community Ecology Data in R:

[Principal Component Analysis
(PCA)](https://www.davidzeleny.net/anadat-r/doku.php/en:pca)

[Correspondence Analysis
(CA)](https://www.davidzeleny.net/anadat-r/doku.php/en:ca_dca)

[Principal Coordinate Analysis
(PCoA)](https://www.davidzeleny.net/anadat-r/doku.php/en:pcoa_nmds)

[non-Metric Multidimensional Scaling
(nMDS)](https://www.davidzeleny.net/anadat-r/doku.php/en:pcoa_nmds)

[Redundancy Analysis
(RDA)](https://www.davidzeleny.net/anadat-r/doku.php/en:rda_cca)

[Canonical Correspondence Analysis
(CCA)](https://www.davidzeleny.net/anadat-r/doku.php/en:rda_cca)

[Distance-based Redundancy Analysis Analysis
(CCA)](https://www.davidzeleny.net/anadat-r/doku.php/en:rda_cca)

## What are Ordinations?

Ordinations are multivariate statistical techniques used to analyse and
visualise complex, high-dimensional data, such as ecological community
data. While clustering methods (see Topic 13) focus on identifying
discontinuities or groups within the data, ordinations aim to highlight
and interpret gradients, which are ubiquitous in ecological communities.
Ordinations are particularly well-suited for handling multivariate data,
which can represent:

-   A spatial context (e.g., a landscape) comprised of many sites
    (rows), each one characterised by multiple variables (columns), such
    as species abundances or environmental factors.
-   A time series (e.g., repeated sampling) comprised of many samples
    (rows), each one containing multiple variables (columns), such as
    species or environmental variables.
-   Multidimensional or multivariate data, where the number of
    dimensions (columns) approaches the number of samples (sites or
    times).

In such complex, high-dimensional data, analysing each variable
separately using a series of univariate or bivariate analyses would be
inefficient and unlikely to reveal the underlying patterns accurately.
For example, in the Doubs River dataset, a univariate approach would
require (27 × 26) / 2 = 351 separate analyses, which is impractical and
prone to misinterpretation.

## Dimension Reduction

Ordinations are dimension reduction methods. They:

-   Take high-dimensional data (many columns).
-   Apply scaling and rotation.
-   Reduce the complexity to a low-dimensional space (orthogonal axes).

Ordinations represent the complex data along a reduced number of
orthogonal axes (linearly independent and uncorrelated), constructed in
such a way that they capture the main trends or gradients in the data in
decreasing order of importance. Each orthogonal axis captures a portion
of the variation attributed to the original variables (columns).
Interpretation of these axes is aided by visualisations (biplots),
regressions, and clustering techniques.

Essentially, ordinations geometrically arrange (project) sites or
species into a simplified dataset, where distances between them in the
Cartesian 2D or 3D space represent their ecological or species
dissimilarities. In this simplified representation, the further apart
the shapes representing sites or species are on the graph, the larger
the ecological differences between them.

> **Analogy of what an ordination does**
>
> Imagine you have a 3D pear and a strong beam of light that casts the
> pear’s shadow onto a flat surface. When you place the pear in the beam
> of light, the shadow that forms on the surface represents a 2D
> projection of the 3D object. Depending on how you rotate the pear, the
> shadow can appear in different shapes. Sometimes, it looks like the
> characteristic pear shape, while other times, it might resemble a
> round disc or an elongated ellipse.
>
> ‘Projection’ in ordination works in a similar way. Consider the
> original data as the 3D pear, existing in a high-dimensional space
> where each dimension represents a different variable. The goal of
> ordination is to find new axes (principal components) that capture the
> most insightful variations in the data. These axes are akin to the
> rotation of the pear in the beam light to cast the shadow.
>
> When you ‘project’ the data onto these new axes, you are essentially
> rotating the pear in the light beam to create a 2D (or
> lower-dimensional) shadow on a plane. This shadow, or projection,
> represents the data in a reduced form. Just like rotating the pear
> reveals different shapes of shadows, rotating the data (changing the
> axes) in ordination can reveal different structures and patterns
> within the data. Some rotations will clearly show the underlying
> structure (e.g., the pear shape), while others might obscure it (e.g.,
> the round disc).
>
> This process of projection helps in visualising complex,
> high-dimensional data in a simpler form and makes it easier to
> identify patterns, clusters, and relationships between variables.

The axes are ordered by the amount of variation they capture, with the
first axis capturing the most variation, the second axis capturing the
second most, and so on. The axes are orthogonal, so they are
uncorrelated. They are linear combinations of the original variables,
making them interpretable.

“Ordination primarily endeavours to represent sample and species
relationships as faithfully as possible in a low-dimensional space”
(Gauch, 1982). This is necessary because visualising multiple dimensions
(species or variables) simultaneously in community data is extremely
challenging, if not impossible. Ordination compromise between the number
of dimensions and the amount of information retained. Ecologists are
frequently confronted by 10s, if not 100s, of variables, species, and
samples. A single multivariate analysis also saves time compared to
conducting separate univariate analyses for each species or variable.
What we really want is for the dimensions of this ‘low-dimensional
space’ to represent important and interpretable environmental gradients.

## Benefits of Ordinations

An ecological reason for preferring ordinations over multiple univariate
analyses is that species do not occur in isolation but in communities.
Species in a community are interdependent and influenced by the same
environmental factors. As such, community patterns may differ from
population patterns. Some ordination methods can also offer insights
into β diversity, which is the variation in species composition among
sites.

A statistical reason for avoiding multiple univariate analyses is the
increased probability of making a Type I error (rejecting a true null
hypothesis) with numerous tests, known as the problem of multiple
comparisons. In contrast, multivariate analysis has a single test,
enhancing statistical power by considering species in aggregate due to
redundancy in the data.

Ordination focuses on ‘important dimensions,’ avoiding the
interpretation of noise, thus acting as a ‘noise reduction technique’
(Gauch, 1982). It allows determining the relative importance of
different gradients, which is virtually impossible with univariate
techniques. For example, one can assess whether the first axis
represents a stronger gradient than the second axis.

A major benefit of ordination is that its numeric output lends itself to
graphical representation, often leading to intuitive interpretations of
species-environment relationships. This is useful for communicating
results to non-specialists.

## Types of Ordinations

The first type of ordination techniques includes **eigen-analysis
methods**, which use linear algebra for dimensionality reduction. The
second type of ordination techniques includes **non-eigen-analysis
methods**, which use iterative algorithms for dimensionality reduction.
I will cover both classes in this lecture, with non-Metric
Multidimensional Scaling being the only example of the second class.

The eigen-analysis methods produce outputs called eigenvectors and
eigenvalues, which are then used to determine the most important
patterns or gradients in the data. These properties and applications of
eigenvectors and eigenvalues will be covered in subsequent sections. The
non-eigen approach instead uses numerical optimisation to find the best
representation of the data in a lower-dimensional space.

Below, I prefer a classification of the ordination methods into
constrained and unconstrained methods. This classification is based on
the type of information used to construct the ordination axes, and how
they are used. Constrained methods use environmental data to construct
the axes, while unconstrained methods do not. The main difference
between these two classes is that constrained methods are
hypothesis-driven, while unconstrained methods are exploratory.

### Unconstrained Ordination (Indirect Gradient Analysis)

These are not statistical techniques (no inference testing); they are
purely descriptive. Sometimes they are called indirect gradient
analysis. These analyses are based on either the environment × sites
matrix or the species × sites matrix, each analysed and interpreted in
isolation. The main goal is to find the main gradients in the data. We
apply indirect gradient analysis when the gradients are unknown *a
priori*, and we do not have environmental data related to the species.
Gradients or other influences that structure species in space are
therefore inferred from the species composition data only. The
communities thus reveal the presence (or absence) of gradients, but may
not offer insight into the identity of the structuring gradients. The
most common methods are:

-   **[Principal Component Analysis (PCA)](pca.qmd):** The main
    eigenvector-based method, working on raw, quantitative data. It
    preserves the Euclidean (linear) distances among sites, mainly used
    for environmental data but also applicable to species
    dissimilarities.
-   **[Correspondence Analysis (CA)](ca.qmd):** Works on data that must
    be frequencies or frequency-like, dimensionally homogeneous, and
    non-negative. It preserves the $\chi^2$ distances among rows or
    columns, mainly used in ecology to analyse species data tables.
-   **[Principal Coordinate Analysis (PCoA)](pcoa.qmd):** Devoted to the
    ordination of dissimilarity or distance matrices, often in the Q
    mode instead of site-by-variables tables, offering great flexibility
    in the choice of association measures.
-   **[non-Metric Multidimensional Scaling (nMDS)](nmds.qmd):** A
    non-eigen-analysis method that works on dissimilarity or rank-order
    distance matrices to study the relationship between sites or
    species. nMDS represents objects along a predetermined number of
    axes while preserving the ordering relationships among them.

### Constrained Ordination (Direct Gradient Analysis)

Constrained (Topic 12) ordination adds a level of statistical testing
and is also called direct gradient analysis or canonical ordination. It
typically uses explanatory variables (in the environmental matrix) to
explain the patterns seen in the species matrix. The main goal is to
find the main gradients in the data and test the significance of these
gradients. So, we use constrained ordination when important gradients
are hypothesised. Likely evidence for the existence of gradients is
measured and captured in a complementary environmental dataset that has
the same spatial structure (rows) as the species dataset. Direct
gradient analysis is performed using linear or non-linear regression
approaches that relate the ordinations performed on the species and
their matching environmental variables. The most common methods are:

-   **Redundancy Analysis (RDA)**: A constrained form of PCA, where
    ordination is constrained by environmental variables, used to study
    the relationship between species and environmental variables.
-   **Canonical Correspondence Analysis (CCA)**: A constrained form of
    CA, where ordination is constrained by environmental variables, used
    to study the relationship between species and environmental
    variables.
-   **Detrended Correspondence Analysis (DCA)**: A constrained form of
    CA, used to study the relationship between species and environmental
    variables.
-   **[Distance-based Redundancy Analysis
    (db-RDA)](constrained_ordination.qmd):** A constrained form of PCoA,
    where ordination is constrained by environmental variables, used to
    study the relationship between species and environmental variables.

PCoA and nMDS can produce ordinations from any square dissimilarity or
distance matrix, offering more flexibility than PCA and CA, which
require site-by-species tables. PCoA and nMDS are also more robust to
outliers and missing data than PCA and CA.

## References