-
Notifications
You must be signed in to change notification settings - Fork 1
/
BIRDS.Rmd
208 lines (177 loc) · 9.83 KB
/
BIRDS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
title: "Introduction to BIRDS"
author: "Alejandro Ruete and Debora Arlt"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
fig_caption: TRUE
vignette: >
%\VignetteIndexEntry{Introduction to BIRDS}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Installing 'BIRDS'
The package 'BIRDS' is currently available on GitHub and you can freely download
the source code here [BIRDS on GitHub](https://github.com/GreenswayAB/BIRDS)
The easiest option is to install the package directly from GitHub using the
package 'remotes'. Install 'remotes as usual `install.packages('remotes')`
if you do not already have installed it.
```{r install package, eval = F}
remotes::install_github('GreenswayAB/BIRDS')
```
If you receive an error we would love to receive your bug report
[here](https://github.com/GreenswayAB/BIRDS/issues)
## Basic example and data requirements
Starting with a primary biodiversity data set (species observations) `BIRDS` employs
three basic steps: organizing the data into a sampling event-based format, summarizing
the data producing summary variables that inform about sampling effort and
data completeness, and reviewing the summarized variables.
This package works with `data.frame` and `data.table` tables containing raw species observations
as rows (i.e. primary biodiversity data, PBD), with minimal information required
for each observation in the following columns:
1. species identification
2. x and y spatial coordinates in two columns (epsg:4326 assumed if not otherwise
specified)
3. date of the observation (either in one formatted column or three columns y-m-d)
The function `organiseBirds()` converts ‘data.frame’ into a `SpatialPointsDataFrame`
adding to each observation a unique identifier for the assumed visit (sampling event)
it belongs to, given some input parameters. The function `organizeBirds()` will interpret
the DarwinCore standard for column names as default, but other names can be specified
too. For more details on how a visit is defined and options for each function please
refer to the [technical vignette](https://greenswayab.github.io/BIRDS/articles/technical_details.html) and to
each function's help page.
Then, the function `summariseBirds()` will overlay the data with a given spatial
grid and create a set of objects that summarize the data spatially, temporally,
and spatio-temporally, and also provide other intermediate results useful for
later analyses. Finally, the function `exportBirds()` helps the user to obtain
the data ready to be plotted with widely known functions.
We use as an example a data set consisting of 10k species observations of
bumblebees (i.e. *Bombus* spp.) in Götaland (the southern part of Sweden) during
2000-2018. The data set `bombusObs` is part of this package and its metadata
can be found in the data set help page `?bombusObs`.
You can easily create a grid over a sample area (i.e. `gotaland`), organize and
summarize the data, and export the variables you want to plot:
```{r checkPROJ, echo=FALSE}
projVer <- sf::sf_extSoftVersion()["PROJ"]
doNextChunk <- as.logical(compareVersion("5.1.0", projVer) == -1)
```
```{r basic example, eval = doNextChunk}
library(BIRDS)
library(sf)
# Create a grid for your sample area that will be used to summarise the data:
grid <- makeGrid(gotaland, gridSize = 10)
# The grid can be easily created in different ways.
# Import the species observation data:
PBD <- bombusObs
# alternatively, you could load a previously downloaded .CSV file
# Convert the data from an observation-based to a visit-based format, adding a
# unique identifier for each visit:
OB <- organizeBirds(PBD, sppCol = "scientificName", simplifySppName = FALSE)
# Summarise the data:
SB <- summariseBirds(OB, grid=grid)
# Look at some summarised variables:
# Number of observations
EBnObs <- exportBirds(SB, dimension = "temporal", timeRes = "yearly",
variable = "nObs", method = "sum")
# Number of visits
EBnVis <- exportBirds(SB, dimension = "temporal", timeRes = "yearly",
variable = "nVis", method = "sum")
# The ratio of number of observations over number of visits
relObs <- EBnObs/EBnVis
# Average species list length (SLL) per year (a double-average, i.e. the mean
# over cell values for the median SLL from all visits per year and cell)
EBavgSll <- colMeans(SB$spatioTemporal[,,"Yearly","avgSll"], na.rm = TRUE)
```
Then, with functions you already know, plot this into a time series.
```{r figure 1, eval = doNextChunk, fig.show='hold', fig.width= 7, fig.height= 5, fig.cap = "Time series for *Bombus* spp. dataset."}
oldpar <- par(no.readonly =TRUE)
par(mar=c(4,4,1,6), las=1)
plot(time(EBnObs), EBnObs, type = "l", lwd = 3, xlab = "Year", ylab = "Number",
ylim=c(0, max(EBnObs)), xaxp=c(2000, 2018, 18))
lines(time(EBnObs), EBnVis, lwd=3, lty=2)
lines(time(EBnObs), relObs * max(EBnObs) / max(relObs), lwd=3, lty=1, col="#78D2EB")
lines(time(EBnObs), EBavgSll * max(EBnObs) / max(EBavgSll), lwd=3, lty=1, col="#FFB3B5")
axis(4, at = seq(0, max(EBnObs), length.out = 5),
labels = round(seq(0,max(relObs), length.out = 5), 1),
lwd = 2, col = "#78D2EB", col.ticks = "#78D2EB")
axis(4, at = seq(0, max(EBnObs), length.out = 5) ,
labels = round(seq(0,max(EBavgSll), length.out = 5), 1),
lwd = 2, col = "#FFB3B5", col.ticks = "#FFB3B5", line = 3)
legend("topleft", legend=c("n.observations","n.visits"),
lty = c(1,2), lwd = 3, bty = "n")
legend("bottomright", legend=c("n.observations / n.visits", "avg. SLL per cell"),
lty = 1, lwd = 3, col = c("#78D2EB", "#FFB3B5"), bty = "n")
par(oldpar)
```
Although we downloaded the data in April 2019, we know that the latest reports
for 2018 were not yet uploaded and we see that in the figure.
We can see that over the years there are on average more observations, more visits and
also more species reported for each visit. While it looks like as sampling effort has increased
we can also see that there are on average fewer observations per visit. This suggests that the
sampling effort per visit has actually decreased. There seem to be many more visits reported
in later years driving also an increase in number of observations, but many visits hold only few observations.
There seem to be also few visits with long species lists driving the increase of average SLL.
Then, we can summarize spatial data in some maps, to review, for example,
Where do we have visits at all?, or How often are visits performed in July vs December?
```{r figure 2, eval = doNextChunk, fig.show='hold', fig.width= 7, fig.height= 3}
wNonEmpty <- whichNonEmpty( SB$overlaid )
EB <- exportBirds(SB, "Spatial", "Month", "nYears", "sum")
# because the dimension is "spatial", the result is a 'SpatialPolygonDataFrame'
palBW <- leaflet::colorNumeric(c("white", "navyblue"),
c(0, max(st_drop_geometry(EB), na.rm = TRUE)),
na.color = "transparent")
oldpar <- par(no.readonly =TRUE)
par(mfrow=c(1,3), mar=c(1,1,1,1))
plot(SB$spatial$geometry[wNonEmpty], col="grey", border = NA)
plot(gotaland$geometry, col=NA, border = "grey", lwd=1, add=TRUE)
mtext("Visited cells", 3, line=-1)
plot(EB$geometry, col=palBW(EB$Jul), border = NA)
plot(gotaland$geometry, col=NA, border = "grey", lwd=1, add=TRUE)
mtext("Number of years for which \nJuly was sampled", 3, line=-2)
plot(EB$geometry, col=palBW(EB$Dec), border = NA)
plot(gotaland$geometry, col=NA, border = "grey", lwd=1, add=TRUE)
mtext("Number of years for which \nDecember was sampled", 3, line=-2)
legend("bottomleft",
legend=seq(0, max(st_drop_geometry(EB), na.rm = TRUE), length.out = 5),
col = palBW(seq(0, max(st_drop_geometry(EB), na.rm = TRUE), length.out = 5)),
title = "Number of years", pch = 15, bty="n")
par(oldpar)
```
We could also map the ignorance scores based on the either the number of observations
or visits using the function `exposeIgnorance()`. Ignorance scores are a proxy
for the lack of sampling effort, computed by making the number of observations
relative to a reference number of observations that is considered to be enough to
reduce the ignorance score by half (henceforth the Half-ignorance approach).
The algorithm behind the Ignorance Score is designed for comparison of bias and
gaps in primary biodiversity data across taxonomy, time and space
Read more here: Ruete (2015) Biodiv Data J 3:e5361 \doi{10.3897/BDJ.3.e5361}.
```{r figure 3, eval = doNextChunk, fig.show='hold', fig.width= 7, fig.height= 4}
oldpar <- par(no.readonly =TRUE)
par(mfrow=c(1,2), mar=c(1,1,1,1))
palBW <- leaflet::colorNumeric(c("white", "navyblue"),
c(0, max(SB$spatial$nVis, na.rm = TRUE)),
na.color = "transparent")
seqNVis <- round(seq(0, max(SB$spatial$nVis, na.rm = TRUE), length.out = 5))
plot(SB$spatial$geometry, col=palBW(SB$spatial$nVis), border = NA)
plot(gotaland$geometry, col=NA, border = "grey", lwd=1, add=TRUE)
legend("bottomleft", legend=seqNVis, col = palBW(seqNVis),
title = "Number of \nobservations", pch = 15, bty="n")
ign <- exposeIgnorance(SB$spatial$nVis, h = 5)
palBWR <- leaflet::colorNumeric(c("navyblue", "white","red"), c(0, 1),
na.color = "transparent")
plot(gotaland$geometry, col="grey90", border = "grey90", lwd=1)
plot(SB$spatial$geometry, col=palBWR(ign), border = NA, add=TRUE)
plot(gotaland$geometry, col=NA, border = "grey", lwd=1, add=TRUE)
legend("bottomleft", legend=c(seq(0, 1, length.out = 5), "NA"),
col = c(palBWR(seq(0, 1, length.out = 5)), "grey90"),
title = "Ignorance \nnVis, \nO0.5=5", pch = 15, bty="n")
par(oldpar)
```
The ignorance scores are highlighting where we have little information and hence cannot be
confident about the lack of reports implying an absence of the species.