/
mining_species.Rmd
218 lines (177 loc) · 7.65 KB
/
mining_species.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
---
title: "Mining species list for any plant family"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Mining species list for any plant family}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{css, echo=FALSE}
.table{
width: auto;
font-size: 14px;
}
.table caption {
font-size: 1em;
}
```
Here in this article, we show how to use the package's function `powoSpecies`
for mining all accepted species for any genus or family of vascular plants in
the World Checklist of Vascular Plants (WCVP; Govaerts et al., 2021) available
through Kew’s Plants of The World Online (POWO, https://powo.science.kew.org).\
\
# Setup
Install the latest development version of __expowo__ from
[GitHub](https://github.com/):
``` r
#install.packages("devtools")
devtools::install_github("DBOSlab/expowo")
```
```{r package load, message=FALSE, warning=FALSE}
library(expowo)
```
\
## Mining all accepted species for any plant family
The function `powoSpecies` returns a data frame or saves a CSV file listing all
accepted species (including hybrid species or not), their publication,
authorship, and global geographic distribution at country or botanical level for
any family of vascular plants. The global classification of botanical divisions
follows the [World Geographical Scheme](https://www.tdwg.org/standards/wgsrpd/)
for Recording Plant Distributions, which is already associated with each taxon's
distribution in the POWO.\
The example below shows how to mine all accepted species for a specified vector
of families. The output shown here (TABLE 1) is a simplified version, where we
have removed some columns (genus, epithet and authors) so as to focus on the
display of the taxonomic and distribution data.\
```{r, eval = FALSE}
res <- powoSpecies(family = c("Cabombaceae", "Martyniaceae"),
hybrid = FALSE,
verbose = TRUE,
save = FALSE,
dir = "results_powoSpecies",
filename = "Cabom_Martyniaceae_search")
```
\
```{r, echo = FALSE, warning = FALSE}
utils::data("angioData")
fam <- c("Cabombaceae", "Martyniaceae")
df <- angioData[angioData$family %in% fam, ]
knitr::kable(df[-c(2, 3, 4, 5, 10, 11, 13)],
row.names = FALSE,
caption = "TABLE 1. A general `powoSpecies` search for mining all
accepted species for some specific plant families.")
```
\
## Narrowing down the `powoSpecies` search based on a specified country vector
You can also narrow down the species checklist search of any family by
focusing on just a particular country or a list of countries. You just need to
define a vector of country names in the argument \code{country}. In the example
below, note that we have originally searched for the species within the families
__Aristolochiaceae__, __Cabombaceae__, and __Martyniaceae__, but the function
returned a smaller species list, because many species in the searched families
are not recorded in any of the specified vector of country, i.e. __Brazil__ or
__Colombia__.\
```{r, eval = FALSE}
ACM <- powoSpecies(family = c("Aristolochiaceae", "Cabombaceae", "Martyniaceae"),
hybrid = FALSE,
country = c("Brazil", "Colombia"),
verbose = FALSE,
save = FALSE,
dir = "results_powoSpecies",
filename = "country_constrained_search")
```
\
```{r, echo = FALSE, warning = FALSE}
utils::data("angioData")
fam <- c("Aristolochiaceae", "Cabombaceae", "Martyniaceae")
df <- angioData[angioData$family %in% fam, ]
country <- c("Brazil", "Colombia")
df <- df[df$native_to_country %in% country, ]
knitr::kable(df[-c(2, 3, 4, 5, 10, 11, 12, 13)],
row.names = FALSE,
caption = "TABLE 2. A `powoSpecies` search based on a specified
country vector.")
```
\
## Narrowing down the `powoSpecies` search based on a specified genus and country vector
You may want to retrieve information for just one or a list of accepted
genera from a given country (or from a list of countries). Just like before,
you only need to define a vector of genus names in the argument \code{genus} and
a vector of country names in the argument \code{country}. In the example below,
see that we have searched for just the genera ***Proboscidea*** and
***Cabomba*** of the families __Martyniaceae__ and __Cabombaceae__,
but the function only returned the __Cabombaceae__ genus ***Cabomba***,
because ***Proboscidea*** does not occur in any of the provided list of countries,
i.e. __Argentina__, __French Guiana__, or __Brazil__.\
```{r, eval = FALSE}
MC <- powoSpecies(family = c("Martyniaceae", "Cabombaceae"),
genus = c("Proboscidea", "Cabomba"),
hybrid = FALSE,
country = c("Argentina", "French Guiana", "Brazil"),
verbose = TRUE,
save = FALSE,
dir = "results_powoSpecies",
filename = "country_and_genus_constrained_search")
```
\
```{r, echo = FALSE, warning = FALSE}
utils::data("angioData")
genus <- c("Proboscidea", "Cabomba")
country <- c("Argentina", "French Guiana", "Brazil")
df <- angioData[angioData$genus %in% genus, ]
df <- df[df$native_to_country %in% country, ]
knitr::kable(df[-c(2, 3, 10, 11, 13)],
row.names = FALSE,
caption = "TABLE 3. A `powoSpecies` search based on specified genus
and country vectors.")
```
\
## Mining all accepted species for all plant families
To mine a species checklist for all families of vascular plants, you recommend
to load the dataframe-formatted data object called `POWOcodes` that comes
associated with the __expowo__ package. The `POWOcodes` data object
already contains the URI addresses for all plant families recognized in
the [POWO](https://powo.science.kew.org) database, so you just need to call it
to your R environment.\
The example below shows how to mine a global checklist of all accepted species
of vascular plants by using the vector of all plant families and
associated URI addresses stored in the `POWOcodes` object.\
```{r, eval = FALSE}
utils::data("POWOcodes")
ALL_spp <- powoSpecies(POWOcodes$family,
hybrid = TRUE,
verbose = TRUE,
save = FALSE,
dir = "results_powoSpecies",
filename = "all_plant_species")
```
\
Since this is a very long search that might requires hours to be completed, it
may happen to loose internet connection at some point of the queried search.
But you do not need to start an entirely new search from the beginning. By
setting the `powoSpecies` with `rerun = TRUE`, a previously stopped search will
continue from where it left off, starting with the last retrieved taxon. Please
ensure that the 'filename' argument exactly matches the name of the CSV file
saved from the previous search, and that the previously saved CSV file is
located within a subfolder named after the current date. If it is not, please
rename the date subfolder accordingly.
\
```{r, eval = FALSE}
utils::data("POWOcodes")
ALL_spp <- powoSpecies(POWOcodes$family,
hybrid = TRUE,
verbose = TRUE,
rerun = TRUE,
save = FALSE,
dir = "results_powoSpecies",
filename = "all_plant_species")
```
\
## Reference
POWO (2019). "Plants of the World Online. Facilitated by the Royal Botanic Gardens, Kew. Published on the Internet; http://www.plantsoftheworldonline.org/ Retrieved April 2023."