Skip to content

Commit

Permalink
Updated book (subsamples -> subsets)
Browse files Browse the repository at this point in the history
  • Loading branch information
TGuillerme committed Nov 15, 2017
1 parent 480223a commit 8e46dba
Show file tree
Hide file tree
Showing 25 changed files with 635 additions and 640 deletions.
28 changes: 14 additions & 14 deletions disparity_object.md
Expand Up @@ -7,29 +7,29 @@ object
|
\---$call* = class:"list" (details of the methods used)
| |
| \---$subsamples = class:"character"
| \---$subsets = class:"character"
| |
| \---$bootstrap = class:"character"
| |
| \---$dimensions = class:"numeric"
| |
| \---$metric = class:"character"
|
\---$subsamples* = class:"list" (subsamples as a list)
\---$subsets* = class:"list" (subsets as a list)
| |
| \---[[1]]* = class:"list" (first item in subsamples list)
| \---[[1]]* = class:"list" (first item in subsets list)
| | |
| | \---$elements* = class:"matrix" (one column matrix containing the elements within the first subsample)
| | \---$elements* = class:"matrix" (one column matrix containing the elements within the first subset)
| | |
| | \---[[2]] = class:"matrix" (matrix containing the bootstrap draws for the unrarefied data)
| | |
| | \---[[3]] = class:"matrix" (matrix containing the bootstrap draws for the first rarefaction level)
| | |
| | \---[[...]] = class:"matrix" (matrix containing the bootstrap draws for the second rarefaction level etc.)
| |
| \---[[2]] = class:"list" (second item in subsamples list)
| \---[[2]] = class:"list" (second item in subsets list)
| | |
| | \---$elements* = class:"matrix" (one column matrix containing the elements within the second subsample)
| | \---$elements* = class:"matrix" (one column matrix containing the elements within the second subset)
| | |
| | \---[[2]] = class:"matrix" (matrix containing the bootstrap draws for the unrarefied data)
| | |
Expand All @@ -41,27 +41,27 @@ object
| | | |
| | \---[[...]] = class:"numeric" (the bootstraps)
| |
| \---[[...]] = class:"list" (the following subsamples)
| \---[[...]] = class:"list" (the following subsets)
| |
| \---$elements* = class:"matrix" (a one column matrix containing the elements within this subsample)
| \---$elements* = class:"matrix" (a one column matrix containing the elements within this subset)
| |
| \---[[...]] = class:"matrix" (the rarefactions)
|
\---$disparity
|
\---[[2]] = class:"list" (the first subsamples)
\---[[2]] = class:"list" (the first subsets)
| |
| \---$observed* = class:"numeric" (vector containing the observed disparity within the subsamples)
| \---$observed* = class:"numeric" (vector containing the observed disparity within the subsets)
| |
| \---[[2]] = class:"matrix" (matrix containing the bootstrap draws for the unrarefied data)
| |
| \---[[3]] = class:"matrix" (matrix containing the bootstrap draws for the first rarefaction level)
| |
| \---[[...]] = class:"matrix" (matrix containing the bootstrap draws for the second rarefaction level etc.)
|
\---[[2]] = class:"list" (the first subsamples)
\---[[2]] = class:"list" (the first subsets)
| |
| \---$observed* = class:"numeric" (vector containing the observed disparity within the subsamples)
| \---$observed* = class:"numeric" (vector containing the observed disparity within the subsets)
| |
| \---[[2]] = class:"matrix" (matrix containing the bootstrap draws for the unrarefied data)
| |
Expand All @@ -73,9 +73,9 @@ object
| |
| \---[[...]] = class:"numeric" (the bootstraps)
|
\---[[...]] = class:"list" (the following subsamples)
\---[[...]] = class:"list" (the following subsets)
|
\---$observed* = class:"numeric" (the vector containing the observed disparity within this subsamples)
\---$observed* = class:"numeric" (the vector containing the observed disparity within this subsets)
|
\---[[...]] = class:"matrix" (the rarefactions)
```
Expand Down
6 changes: 3 additions & 3 deletions inst/gitbook/01_glossary.Rmd
Expand Up @@ -19,6 +19,6 @@ output:

- **Dimensions**. The columns of the multidimensional space matrix. The dimensions can be referred to as axes of variation, or principal components, for ordinated spaces obtained from a PCA for example.

- **Subsamples**. Subsamples of the multidimensional space.
A subsample (or subsamples) contains the same number of dimensions as the space but may contain a smaller subset of elements.
For example, if our space is composed of birds and mammals (the elements) and 50 principal components of variation (the dimensions), we can create two subsamples containing just mammals or birds, but with the same 50 dimensions, to compare disparity in the two clades.
- **Subsets**. Subsets of the multidimensional space.
A subset (or subsets) contains the same number of dimensions as the space but may contain a smaller subset of elements.
For example, if our space is composed of birds and mammals (the elements) and 50 principal components of variation (the dimensions), we can create two subsets containing just mammals or birds, but with the same 50 dimensions, to compare disparity in the two clades.
8 changes: 4 additions & 4 deletions inst/gitbook/02_getting-started.Rmd
Expand Up @@ -62,7 +62,7 @@ geomorph.ordination(procrustes)[1:5,1:5]

Options for the ordination (from `?prcomp`) can be directly passed to this function to perform customised ordinations.
Additionally you can give the function a `geomorph.data.frame` object.
If the latter contains sorting information (i.e. factors), they can be directly used to make a customised `dispRity` object [customised `dispRity` object](#customised-subsamples)!
If the latter contains sorting information (i.e. factors), they can be directly used to make a customised `dispRity` object [customised `dispRity` object](#customised-subsets)!

```{r}
## Using a geomorph.data.frame
Expand Down Expand Up @@ -126,7 +126,7 @@ For example, which metric should you use?
How many bootstraps do you require?
What model of evolution is most appropriate if you are time slicing?
Should you rarefy the data?
See [`time.subsamples`](#time-slicing), [`custom.subsamples`](#customised-subsamples), [`boot.matrix`](#bootstraps-and-rarefactions) and [`dispRity.metric`](#disparity-metrics) for more details of the defaults used in each of these functions.
See [`time.subsets`](#time-slicing), [`custom.subsets`](#customised-subsets), [`boot.matrix`](#bootstraps-and-rarefactions) and [`dispRity.metric`](#disparity-metrics) for more details of the defaults used in each of these functions.
Note that any of these default arguments can be changed within the `disparity.through.time` or `disparity.per.group` functions.

### Example data
Expand Down Expand Up @@ -168,7 +168,7 @@ For a disparity through time analysis, you will need:

* An ordinated matrix (we covered that above)
* A phylogenetic tree: this must be a `phylo` object (from the `ape` package) and needs a `root.time` element. To give your tree a root time (i.e. an age for the root), you can simply do\\ `my_tree$root.time <- my_age`.
* The required number of time subsamples (here `time = 3`)
* The required number of time subsets (here `time = 3`)
* Your favourite disparity metric (here the sum of variances)

Using the Beck and Lee (2014) data described [above](#example-data):
Expand All @@ -187,7 +187,7 @@ When displayed, these `dispRity` objects provide us with information on the oper
disparity_data
```

We asked for three subsamples (evenly spread across the age of the tree), the data was bootstrapped 100 times (default) and the metric used was the sum of variances.
We asked for three subsets (evenly spread across the age of the tree), the data was bootstrapped 100 times (default) and the metric used was the sum of variances.

We can now summarise or plot the `disparity_data` object, or perform statistical tests on it (e.g. a simple `lm`):

Expand Down
74 changes: 37 additions & 37 deletions inst/gitbook/03_specific-tutorials.Rmd
Expand Up @@ -25,13 +25,13 @@ data(BeckLee_tree) ; data(BeckLee_ages)

## Time slicing

The function `time.subsamples` allows users to divide the matrix into different time subsamples or slices given a dated phylogeny that contains all the elements (i.e. taxa) from the matrix.
Each subsample generated by this function will then contain all the elements present at a specific point in time or during a specific period in time.
The function `time.subsets` allows users to divide the matrix into different time subsets or slices given a dated phylogeny that contains all the elements (i.e. taxa) from the matrix.
Each subset generated by this function will then contain all the elements present at a specific point in time or during a specific period in time.

Two types of time subsamples can be performed by using the `method` option:
Two types of time subsets can be performed by using the `method` option:

* Discrete time subsamples (or time-binning) using `method = discrete`
* Continuous time subsamples (or time-slicing) using `method = continuous`
* Discrete time subsets (or time-binning) using `method = discrete`
* Continuous time subsets (or time-slicing) using `method = continuous`

For the time-slicing method details see Cooper and Guillerme (in prep.). <!-- @@@ Change cite appropriately! -->
<!-- NC: Or potentially this paper we are writing for the PalAss? TG: totally!-->
Expand All @@ -47,15 +47,15 @@ Here is an example for `method = discrete`:

```{r, eval=TRUE}
## Generating three time bins containing the taxa present every 40 Ma
time.subsamples(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
time.subsets(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
time = c(120, 80, 40, 0))
```

Note that we can also generate equivalent results by just telling the function that we want three time-bins as follow:

```{r, eval=TRUE}
## Automatically generate three equal length bins:
time.subsamples(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
time.subsets(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
time = 3)
```

Expand All @@ -70,7 +70,7 @@ This table should have the taxa names as row names and two columns for respectiv
head(BeckLee_ages)
## Generating time bins including taxa that might span between them
time.subsamples(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
time.subsets(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
time = c(120, 80, 40, 0), FADLAD = BeckLee_ages)
```

Expand All @@ -96,18 +96,18 @@ These later models perform better when bootstrapped, effectively approximating t

```{r, eval=TRUE}
## Generating four time slices every 40 million years under a model of proximity evolution
time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree,
time.subsets(data = BeckLee_mat99, tree = BeckLee_tree,
method = "continuous", model = "proximity", time = c(120, 80, 40, 0),
FADLAD = BeckLee_ages)
## Generating four time slices automatically
time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree,
time.subsets(data = BeckLee_mat99, tree = BeckLee_tree,
method = "continuous", model = "proximity", time = 4, FADLAD = BeckLee_ages)
```

## Customised subsamples
## Customised subsets

Another way of separating elements into different categories is to use customised subsamples as briefly explained [above](#disparity-among-groups).
Another way of separating elements into different categories is to use customised subsets as briefly explained [above](#disparity-among-groups).
This function simply takes the list of elements to put in each group (whether they are the actual element names or their position in the matrix).

```{r, eval=TRUE}
Expand All @@ -116,7 +116,7 @@ mammal_groups <- list("crown" = c(16, 19:41, 45:50),
"stem" = c(1:15, 17:18, 42:44))
## Separating the dataset into two different groups
custom.subsamples(BeckLee_mat50, group = mammal_groups)
custom.subsets(BeckLee_mat50, group = mammal_groups)
```

Elements can easily be assigned to different groups if necessary!
Expand Down Expand Up @@ -156,7 +156,7 @@ Rarefaction allows users to limit the number of elements to be drawn at each boo
This is useful if, for example, one is interested in looking at the effect of reducing the number of elements on the results of an analysis.

This can be achieved by using the `rarefaction` option that draws only *n-x* at each bootstrap replicate (where *x* is the number of elements not sampled).
The default argument is `FALSE` but it can be set to `TRUE` to fully rarefy the data (i.e. remove *x* elements for the number of pseudo-replicates, where *x* varies from the maximum number of elements present in each subsample to a minimum of three elements).
The default argument is `FALSE` but it can be set to `TRUE` to fully rarefy the data (i.e. remove *x* elements for the number of pseudo-replicates, where *x* varies from the maximum number of elements present in each subset to a minimum of three elements).
It can also be set to one or more `numeric` values to only rarefy to the corresponding number of elements.

```{r, eval=TRUE}
Expand All @@ -178,23 +178,23 @@ boot.matrix(BeckLee_mat50, dimensions = 0.5)
boot.matrix(BeckLee_mat50, dimensions = 10)
```

Of course, one could directly supply the subsamples generated above (using `time.subsamples` or `custom.subsamples`) to this function.
Of course, one could directly supply the subsets generated above (using `time.subsets` or `custom.subsets`) to this function.

```{r, eval=TRUE}
## Creating subsamples of crown and stem mammals
crown_stem <- custom.subsamples(BeckLee_mat50,
## Creating subsets of crown and stem mammals
crown_stem <- custom.subsets(BeckLee_mat50,
group = list("crown" = c(16, 19:41, 45:50),
"stem" = c(1:15, 17:18, 42:44)))
## Bootstrapping and rarefying these groups
boot.matrix(crown_stem, bootstraps = 200, rarefaction = TRUE)
## Creating time slice subsamples
time_slices <- time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree,
## Creating time slice subsets
time_slices <- time.subsets(data = BeckLee_mat99, tree = BeckLee_tree,
method = "continuous", model = "proximity",
time = c(120, 80, 40, 0),
FADLAD = BeckLee_ages)
## Bootstrapping the time slice subsamples
## Bootstrapping the time slice subsets
boot.matrix(time_slices, bootstraps = 100)
```

Expand Down Expand Up @@ -410,7 +410,7 @@ The functions `ellipse.volume`, `convhull.surface` and `convhull.volume` all me
## Calculating the ellipsoid volume
summary(dispRity(dummy_space, metric = ellipse.volume))
```
> Because there is only one subsample (i.e. one matrix) in the dispRity object, this operation is the equivalent of `ellipse.volume(dummy_space)` (with rounding).
> Because there is only one subset (i.e. one matrix) in the dispRity object, this operation is the equivalent of `ellipse.volume(dummy_space)` (with rounding).
```{r}
## Calculating the convex hull surface
Expand Down Expand Up @@ -496,28 +496,28 @@ This function is an S3 function (`summary.dispRity`) allowing users to summarise

```{r}
## Example data from previous sections
crown_stem <- custom.subsamples(BeckLee_mat50,
crown_stem <- custom.subsets(BeckLee_mat50,
group = list("crown" = c(16, 19:41, 45:50),
"stem" = c(1:15, 17:18, 42:44)))
## Bootstrapping and rarefying these groups
boot_crown_stem <- boot.matrix(crown_stem, bootstraps = 100, rarefaction = TRUE)
## Calculate disparity
disparity_crown_stem <- dispRity(boot_crown_stem, metric = c(sum, variances))
## Creating time slice subsamples
time_slices <- time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree,
## Creating time slice subsets
time_slices <- time.subsets(data = BeckLee_mat99, tree = BeckLee_tree,
method = "continuous", model = "proximity", time = c(120, 80, 40, 0),
FADLAD = BeckLee_ages)
## Bootstrapping the time slice subsamples
## Bootstrapping the time slice subsets
boot_time_slices <- boot.matrix(time_slices, bootstraps = 100)
## Calculate disparity
disparity_time_slices <- dispRity(boot_time_slices, metric = c(sum, variances))
## Creating time bin subsamples
time_bins <- time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree,
## Creating time bin subsets
time_bins <- time.subsets(data = BeckLee_mat99, tree = BeckLee_tree,
method = "discrete", time = c(120, 80, 40, 0), FADLAD = BeckLee_ages,
inc.nodes = TRUE)
## Bootstrapping the time bin subsamples
## Bootstrapping the time bin subsets
boot_time_bins <- boot.matrix(time_bins, bootstraps = 100)
## Calculate disparity
disparity_time_bins <- dispRity(boot_time_bins, metric = c(sum, variances))
Expand All @@ -530,7 +530,7 @@ These objects are easy to summarise as follows:
summary(disparity_time_slices)
```

Information about the number of elements in each subsample and the observed (i.e. non-bootstrapped) disparity are also calculated.
Information about the number of elements in each subset and the observed (i.e. non-bootstrapped) disparity are also calculated.
This is specifically handy when rarefying the data for example:

```{r}
Expand Down Expand Up @@ -573,9 +573,9 @@ The plots can be of four different types:
* `continuous` for displaying continuous disparity curves
* `box`, `lines`, and `polygons` to display discrete disparity results in respectively a boxplot, confidence interval lines, and confidence interval polygons.

> This argument can be left empty. In this case, the algorithm will automatically detect the type of subsamples from the `dispRity` object and plot accordingly.
> This argument can be left empty. In this case, the algorithm will automatically detect the type of subsets from the `dispRity` object and plot accordingly.
It is also possible to display the number of elements in each subsample (as a horizontal dotted line) using the option `elements = TRUE`.
It is also possible to display the number of elements in each subset (as a horizontal dotted line) using the option `elements = TRUE`.
Additionally, when the data is rarefied, one can indicate which level of rarefaction to display (i.e. only display the results for a certain number of elements) by using the `rarefaction` argument.

```{r, fig.width=8, fig.height=8}
Expand Down Expand Up @@ -628,7 +628,7 @@ op <- par(bty = "n")
## Plotting the results with some plot.dispRity arguments
plot(disparity_time_slices, quantile = c(seq(from = 10, to = 100, by = 10)),
cent.tend = sd, type = "c", elements = TRUE, col = c("black", rainbow(10)),
ylab = c("Disparity", "Diversity"), time.subsamples = FALSE,
ylab = c("Disparity", "Diversity"), time.subsets = FALSE,
xlab = "Time (in in units from past to present)", observed = TRUE,
main = "Many more options...")
Expand Down Expand Up @@ -674,12 +674,12 @@ The function `test.dispRity` works in a similar way to the `dispRity` function:

The `comparisons` argument indicates the way the test should be applied to the data:

* `pairwise` (default): to compare each subsample in a pairwise manner
* `referential`: to compare each subsample to the first subsample
* `sequential`: to compare each subsample to the following subsample
* `all`: to compare all the subsamples together (like in analysis of variance)
* `pairwise` (default): to compare each subset in a pairwise manner
* `referential`: to compare each subset to the first subset
* `sequential`: to compare each subset to the following subset
* `all`: to compare all the subsets together (like in analysis of variance)

It is also possible to input a list of pairs of `numeric` values or `characters` matching the subsample names to create personalised tests.
It is also possible to input a list of pairs of `numeric` values or `characters` matching the subset names to create personalised tests.
Some other tests implemented in `dispRity` such as the `dispRity::null.test` have a specific way they are applied to the data and therefore ignore the `comparisons` argument.
<!-- Add sequential test one day! -->

Expand Down

0 comments on commit 8e46dba

Please sign in to comment.