Updated book (subsamples -> subsets)

TGuillerme · Nov 15, 2017 · 8e46dba · 8e46dba
1 parent 480223a
commit 8e46dba
Show file tree

Hide file tree

Showing 25 changed files with 635 additions and 640 deletions.
diff --git a/disparity_object.md b/disparity_object.md
@@ -7,29 +7,29 @@ object
 	|
 	\---$call* = class:"list" (details of the methods used)
 	|	|
-	|	\---$subsamples = class:"character"
+	|	\---$subsets = class:"character"
 	|	|
 	|	\---$bootstrap = class:"character"
 	|	|
 	|	\---$dimensions = class:"numeric"
 	|	|
 	|	\---$metric = class:"character"
 	|
-	\---$subsamples* = class:"list" (subsamples as a list)
+	\---$subsets* = class:"list" (subsets as a list)
 	|	|
-	|	\---[[1]]* = class:"list" (first item in subsamples list)
+	|	\---[[1]]* = class:"list" (first item in subsets list)
 	|	|	|
-	|	|	\---$elements* = class:"matrix" (one column matrix containing the elements within the first subsample)
+	|	|	\---$elements* = class:"matrix" (one column matrix containing the elements within the first subset)
 	|	|	|
 	|	|	\---[[2]] = class:"matrix" (matrix containing the bootstrap draws for the unrarefied data)
 	|	|	|
 	|	|	\---[[3]] = class:"matrix" (matrix containing the bootstrap draws for the first rarefaction level)
 	|	|	|
 	|	|	\---[[...]] = class:"matrix" (matrix containing the bootstrap draws for the second rarefaction level etc.)
 	|	|
-	|	\---[[2]] = class:"list" (second item in subsamples list)
+	|	\---[[2]] = class:"list" (second item in subsets list)
 	|	|	|
-	|	|	\---$elements* = class:"matrix" (one column matrix containing the elements within the second subsample)
+	|	|	\---$elements* = class:"matrix" (one column matrix containing the elements within the second subset)
 	|	|	|
 	|	|	\---[[2]] = class:"matrix" (matrix containing the bootstrap draws for the unrarefied data)
 	|	|	|
@@ -41,27 +41,27 @@ object
 	|	|	|	|
 	|	|		\---[[...]] = class:"numeric" (the bootstraps)
 	|	|
-	|	\---[[...]] = class:"list" (the following subsamples)
+	|	\---[[...]] = class:"list" (the following subsets)
 	|		|
-	|		\---$elements* = class:"matrix" (a one column matrix containing the elements within this subsample)
+	|		\---$elements* = class:"matrix" (a one column matrix containing the elements within this subset)
 	|		|
 	|		\---[[...]] = class:"matrix" (the rarefactions)
 	|
 	\---$disparity
 		|
-		\---[[2]] = class:"list" (the first subsamples)
+		\---[[2]] = class:"list" (the first subsets)
 		|	|
-		|	\---$observed* = class:"numeric" (vector containing the observed disparity within the subsamples)
+		|	\---$observed* = class:"numeric" (vector containing the observed disparity within the subsets)
 		|	|
 		|	\---[[2]] = class:"matrix" (matrix containing the bootstrap draws for the unrarefied data)
 		|	|
 		|	\---[[3]] = class:"matrix" (matrix containing the bootstrap draws for the first rarefaction level)
 		|	|
 		|	\---[[...]] = class:"matrix" (matrix containing the bootstrap draws for the second rarefaction level etc.)
 		|
-		\---[[2]] = class:"list" (the first subsamples)
+		\---[[2]] = class:"list" (the first subsets)
 		|	|
-		|	\---$observed* = class:"numeric" (vector containing the observed disparity within the subsamples)
+		|	\---$observed* = class:"numeric" (vector containing the observed disparity within the subsets)
 		|	|
 		|	\---[[2]] = class:"matrix" (matrix containing the bootstrap draws for the unrarefied data)
 		|	|
@@ -73,9 +73,9 @@ object
 		|		|
 		|		\---[[...]] = class:"numeric" (the bootstraps)
 		|
-		\---[[...]] = class:"list" (the following subsamples)
+		\---[[...]] = class:"list" (the following subsets)
 			|
-			\---$observed* = class:"numeric" (the vector containing the observed disparity within this subsamples)
+			\---$observed* = class:"numeric" (the vector containing the observed disparity within this subsets)
 			|
 			\---[[...]] = class:"matrix" (the rarefactions)
 ```

diff --git a/inst/gitbook/01_glossary.Rmd b/inst/gitbook/01_glossary.Rmd
@@ -19,6 +19,6 @@ output:
 
 -   **Dimensions**. The columns of the multidimensional space matrix. The dimensions can be referred to as axes of variation, or principal components, for ordinated spaces obtained from a PCA for example.
 
--   **Subsamples**. Subsamples of the multidimensional space.
-    A subsample (or subsamples) contains the same number of dimensions as the space but may contain a smaller subset of elements.
-    For example, if our space is composed of birds and mammals (the elements) and 50 principal components of variation (the dimensions), we can create two subsamples containing just mammals or birds, but with the same 50 dimensions, to compare disparity in the two clades.
+-   **Subsets**. Subsets of the multidimensional space.
+    A subset (or subsets) contains the same number of dimensions as the space but may contain a smaller subset of elements.
+    For example, if our space is composed of birds and mammals (the elements) and 50 principal components of variation (the dimensions), we can create two subsets containing just mammals or birds, but with the same 50 dimensions, to compare disparity in the two clades.
diff --git a/inst/gitbook/02_getting-started.Rmd b/inst/gitbook/02_getting-started.Rmd
@@ -62,7 +62,7 @@ geomorph.ordination(procrustes)[1:5,1:5]
 
 Options for the ordination (from `?prcomp`) can be directly passed to this function to perform customised ordinations.
 Additionally you can give the function a `geomorph.data.frame` object.
-If the latter contains sorting information (i.e. factors), they can be directly used to make a customised `dispRity` object [customised `dispRity` object](#customised-subsamples)!
+If the latter contains sorting information (i.e. factors), they can be directly used to make a customised `dispRity` object [customised `dispRity` object](#customised-subsets)!
 
 ```{r}
 ## Using a geomorph.data.frame
@@ -126,7 +126,7 @@ For example, which metric should you use?
 How many bootstraps do you require?
 What model of evolution is most appropriate if you are time slicing?
 Should you rarefy the data?
-See [`time.subsamples`](#time-slicing), [`custom.subsamples`](#customised-subsamples), [`boot.matrix`](#bootstraps-and-rarefactions) and [`dispRity.metric`](#disparity-metrics) for more details of the defaults used in each of these functions.
+See [`time.subsets`](#time-slicing), [`custom.subsets`](#customised-subsets), [`boot.matrix`](#bootstraps-and-rarefactions) and [`dispRity.metric`](#disparity-metrics) for more details of the defaults used in each of these functions.
 Note that any of these default arguments can be changed within the `disparity.through.time` or `disparity.per.group` functions.
 
 ### Example data
@@ -168,7 +168,7 @@ For a disparity through time analysis, you will need:
 
   * An ordinated matrix (we covered that above)
   * A phylogenetic tree: this must be a `phylo` object (from the `ape` package) and needs a `root.time` element. To give your tree a root time (i.e. an age for the root), you can simply do\\ `my_tree$root.time <- my_age`.
-  * The required number of time subsamples (here `time = 3`)
+  * The required number of time subsets (here `time = 3`)
   * Your favourite disparity metric (here the sum of variances)
 
 Using the Beck and Lee (2014) data described [above](#example-data):
@@ -187,7 +187,7 @@ When displayed, these `dispRity` objects provide us with information on the oper
 disparity_data
 ```
 
-We asked for three subsamples (evenly spread across the age of the tree), the data was bootstrapped 100 times (default) and the metric used was the sum of variances.
+We asked for three subsets (evenly spread across the age of the tree), the data was bootstrapped 100 times (default) and the metric used was the sum of variances.
 
 We can now summarise or plot the `disparity_data` object, or perform statistical tests on it (e.g. a simple `lm`): 
 

diff --git a/inst/gitbook/03_specific-tutorials.Rmd b/inst/gitbook/03_specific-tutorials.Rmd
@@ -25,13 +25,13 @@ data(BeckLee_tree) ; data(BeckLee_ages)
 
 ## Time slicing
 
-The function `time.subsamples` allows users to divide the matrix into different time subsamples or slices given a dated phylogeny that contains all the elements (i.e. taxa) from the matrix.
-Each subsample generated by this function will then contain all the elements present at a specific point in time or during a specific period in time.
+The function `time.subsets` allows users to divide the matrix into different time subsets or slices given a dated phylogeny that contains all the elements (i.e. taxa) from the matrix.
+Each subset generated by this function will then contain all the elements present at a specific point in time or during a specific period in time.
 
-Two types of time subsamples can be performed by using the `method` option:
+Two types of time subsets can be performed by using the `method` option:
 
- *  Discrete time subsamples (or time-binning) using `method = discrete`
- *  Continuous time subsamples (or time-slicing) using `method = continuous`
+ *  Discrete time subsets (or time-binning) using `method = discrete`
+ *  Continuous time subsets (or time-slicing) using `method = continuous`
 
 For the time-slicing method details see Cooper and Guillerme (in prep.). <!-- @@@ Change cite appropriately! -->
 <!-- NC: Or potentially this paper we are writing for the PalAss? TG: totally!-->
@@ -47,15 +47,15 @@ Here is an example for `method = discrete`:
 
 ```{r, eval=TRUE}
 ## Generating three time bins containing the taxa present every 40 Ma
-time.subsamples(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
+time.subsets(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
                 time = c(120, 80, 40, 0))
 ```
 
 Note that we can also generate equivalent results by just telling the function that we want three time-bins as follow:
 
 ```{r, eval=TRUE}
 ## Automatically generate three equal length bins:
-time.subsamples(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
+time.subsets(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
                 time = 3)
 ```
 
@@ -70,7 +70,7 @@ This table should have the taxa names as row names and two columns for respectiv
 head(BeckLee_ages)
 
 ## Generating time bins including taxa that might span between them
-time.subsamples(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
+time.subsets(data = BeckLee_mat50, tree = BeckLee_tree, method = "discrete",
                 time = c(120, 80, 40, 0), FADLAD = BeckLee_ages)
 ```
 
@@ -96,18 +96,18 @@ These later models perform better when bootstrapped, effectively approximating t
 
 ```{r, eval=TRUE}
 ## Generating four time slices every 40 million years under a model of proximity evolution
-time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree, 
+time.subsets(data = BeckLee_mat99, tree = BeckLee_tree, 
     method = "continuous", model = "proximity", time = c(120, 80, 40, 0),
     FADLAD = BeckLee_ages)
 
 ## Generating four time slices automatically
-time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree,
+time.subsets(data = BeckLee_mat99, tree = BeckLee_tree,
     method = "continuous", model = "proximity", time = 4, FADLAD = BeckLee_ages)
 ```
 
-## Customised subsamples
+## Customised subsets
 
-Another way of separating elements into different categories is to use customised subsamples as briefly explained [above](#disparity-among-groups).
+Another way of separating elements into different categories is to use customised subsets as briefly explained [above](#disparity-among-groups).
 This function simply takes the list of elements to put in each group (whether they are the actual element names or their position in the matrix).
 
 ```{r, eval=TRUE}
@@ -116,7 +116,7 @@ mammal_groups <- list("crown" = c(16, 19:41, 45:50),
                       "stem" = c(1:15, 17:18, 42:44))
 
 ## Separating the dataset into two different groups
-custom.subsamples(BeckLee_mat50, group = mammal_groups)
+custom.subsets(BeckLee_mat50, group = mammal_groups)
 ```
 
 Elements can easily be assigned to different groups if necessary!
@@ -156,7 +156,7 @@ Rarefaction allows users to limit the number of elements to be drawn at each boo
 This is useful if, for example, one is interested in looking at the effect of reducing the number of elements on the results of an analysis.
 
 This can be achieved by using the `rarefaction` option that draws only *n-x* at each bootstrap replicate (where *x* is the number of elements not sampled).
-The default argument is `FALSE` but it can be set to `TRUE` to fully rarefy the data (i.e. remove *x* elements for the number of pseudo-replicates, where *x* varies from the maximum number of elements present in each subsample to a minimum of three elements).
+The default argument is `FALSE` but it can be set to `TRUE` to fully rarefy the data (i.e. remove *x* elements for the number of pseudo-replicates, where *x* varies from the maximum number of elements present in each subset to a minimum of three elements).
 It can also be set to one or more `numeric` values to only rarefy to the corresponding number of elements.
 
 ```{r, eval=TRUE}
@@ -178,23 +178,23 @@ boot.matrix(BeckLee_mat50, dimensions = 0.5)
 boot.matrix(BeckLee_mat50, dimensions = 10)
 ```
 
-Of course, one could directly supply the subsamples generated above (using `time.subsamples` or `custom.subsamples`) to this function.
+Of course, one could directly supply the subsets generated above (using `time.subsets` or `custom.subsets`) to this function.
 
 ```{r, eval=TRUE}
-## Creating subsamples of crown and stem mammals
-crown_stem <- custom.subsamples(BeckLee_mat50,
+## Creating subsets of crown and stem mammals
+crown_stem <- custom.subsets(BeckLee_mat50,
                                 group = list("crown" = c(16, 19:41, 45:50), 
                                              "stem" = c(1:15, 17:18, 42:44)))
 ## Bootstrapping and rarefying these groups
 boot.matrix(crown_stem, bootstraps = 200, rarefaction = TRUE)
 
-## Creating time slice subsamples
-time_slices <- time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree, 
+## Creating time slice subsets
+time_slices <- time.subsets(data = BeckLee_mat99, tree = BeckLee_tree, 
                                method = "continuous", model = "proximity", 
                                time = c(120, 80, 40, 0),
                                FADLAD = BeckLee_ages)
 
-## Bootstrapping the time slice subsamples
+## Bootstrapping the time slice subsets
 boot.matrix(time_slices, bootstraps = 100)
 ```
 
@@ -410,7 +410,7 @@ The functions `ellipse.volume`, `convhull.surface` and `convhull.volume`  all me
 ## Calculating the ellipsoid volume
 summary(dispRity(dummy_space, metric = ellipse.volume))
 ```
-> Because there is only one subsample (i.e. one matrix) in the dispRity object, this operation is the equivalent of `ellipse.volume(dummy_space)` (with rounding).
+> Because there is only one subset (i.e. one matrix) in the dispRity object, this operation is the equivalent of `ellipse.volume(dummy_space)` (with rounding).
 
 ```{r}
 ## Calculating the convex hull surface
@@ -496,28 +496,28 @@ This function is an S3 function (`summary.dispRity`) allowing users to summarise
 
 ```{r}
 ## Example data from previous sections
-crown_stem <- custom.subsamples(BeckLee_mat50,
+crown_stem <- custom.subsets(BeckLee_mat50,
                                 group = list("crown" = c(16, 19:41, 45:50), 
                                              "stem" = c(1:15, 17:18, 42:44)))
 ## Bootstrapping and rarefying these groups
 boot_crown_stem <- boot.matrix(crown_stem, bootstraps = 100, rarefaction = TRUE)
 ## Calculate disparity
 disparity_crown_stem <- dispRity(boot_crown_stem, metric = c(sum, variances))
 
-## Creating time slice subsamples
-time_slices <- time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree, 
+## Creating time slice subsets
+time_slices <- time.subsets(data = BeckLee_mat99, tree = BeckLee_tree, 
     method = "continuous", model = "proximity", time = c(120, 80, 40, 0),
     FADLAD = BeckLee_ages)
-## Bootstrapping the time slice subsamples
+## Bootstrapping the time slice subsets
 boot_time_slices <- boot.matrix(time_slices, bootstraps = 100)
 ## Calculate disparity
 disparity_time_slices <- dispRity(boot_time_slices, metric = c(sum, variances))
 
-## Creating time bin subsamples
-time_bins <- time.subsamples(data = BeckLee_mat99, tree = BeckLee_tree, 
+## Creating time bin subsets
+time_bins <- time.subsets(data = BeckLee_mat99, tree = BeckLee_tree, 
     method = "discrete", time = c(120, 80, 40, 0), FADLAD = BeckLee_ages,
     inc.nodes = TRUE)
-## Bootstrapping the time bin subsamples
+## Bootstrapping the time bin subsets
 boot_time_bins <- boot.matrix(time_bins, bootstraps = 100)
 ## Calculate disparity
 disparity_time_bins <- dispRity(boot_time_bins, metric = c(sum, variances))
@@ -530,7 +530,7 @@ These objects are easy to summarise as follows:
 summary(disparity_time_slices)
 ```
 
-Information about the number of elements in each subsample and the observed (i.e. non-bootstrapped) disparity are also calculated.
+Information about the number of elements in each subset and the observed (i.e. non-bootstrapped) disparity are also calculated.
 This is specifically handy when rarefying the data for example:
 
 ```{r}
@@ -573,9 +573,9 @@ The plots can be of four different types:
  * `continuous` for displaying continuous disparity curves
  * `box`, `lines`, and `polygons` to display discrete disparity results in respectively a boxplot, confidence interval lines, and confidence interval polygons.
 
-> This argument can be left empty. In this case, the algorithm will automatically detect the type of subsamples from the `dispRity` object and plot accordingly.
+> This argument can be left empty. In this case, the algorithm will automatically detect the type of subsets from the `dispRity` object and plot accordingly.
 
-It is also possible to display the number of elements in each subsample (as a horizontal dotted line) using the option `elements = TRUE`.
+It is also possible to display the number of elements in each subset (as a horizontal dotted line) using the option `elements = TRUE`.
 Additionally, when the data is rarefied, one can indicate which level of rarefaction to display (i.e. only display the results for a certain number of elements) by using the `rarefaction` argument.
 
 ```{r, fig.width=8, fig.height=8}
@@ -628,7 +628,7 @@ op <- par(bty = "n")
 ## Plotting the results with some plot.dispRity arguments
 plot(disparity_time_slices, quantile = c(seq(from = 10, to = 100, by = 10)),
     cent.tend = sd, type = "c", elements = TRUE, col = c("black", rainbow(10)),
-    ylab = c("Disparity", "Diversity"), time.subsamples = FALSE,
+    ylab = c("Disparity", "Diversity"), time.subsets = FALSE,
     xlab = "Time (in in units from past to present)", observed = TRUE,
     main = "Many more options...")
 
@@ -674,12 +674,12 @@ The function `test.dispRity` works in a similar way to the `dispRity` function:
 
 The `comparisons` argument indicates the way the test should be applied to the data:
 
- * `pairwise` (default): to compare each subsample in a pairwise manner
- * `referential`: to compare each subsample to the first subsample
- * `sequential`: to compare each subsample to the following subsample
- * `all`: to compare all the subsamples together (like in analysis of variance)
+ * `pairwise` (default): to compare each subset in a pairwise manner
+ * `referential`: to compare each subset to the first subset
+ * `sequential`: to compare each subset to the following subset
+ * `all`: to compare all the subsets together (like in analysis of variance)
 
-It is also possible to input a list of pairs of `numeric` values or `characters` matching the subsample names to create personalised tests.
+It is also possible to input a list of pairs of `numeric` values or `characters` matching the subset names to create personalised tests.
 Some other tests implemented in `dispRity` such as the `dispRity::null.test` have a specific way they are applied to the data and therefore ignore the `comparisons` argument. 
 <!-- Add sequential test one day! -->