/
region-spatial.Rmd
297 lines (226 loc) · 10.8 KB
/
region-spatial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
---
title: "Using spatial data to filter observations"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
---
```{r setup, include = FALSE}
options(width = 100)
knitr::opts_chunk$set(out.width = "60%", fig.width = 6, fig.asp = 1, fig.align = "center")
```
In this article we'll go over how to use spatial data to more precisely define regional filters. Often you might be interested only in observations which fall within a very specific geographic area. While the `nc_data_dl()` function cannot take a shapefile as an argument, you can use regional codes to filter your data, or you can use shape files to specify either the `utm_squares` or the `bbox` (bounding box) surrounding your area of interest. After the download, you can then trip the resulting observations to your original shape file.
> The following examples use the "testuser" user which is not available to you.
> You can quickly [sign up for a free account](https://naturecounts.ca/nc/default/register.jsp)
> of your own to access and play around with these examples. Simply replace
> `testuser` with your own username.
## Setup
We'll be using the [`tidyverse`](http://tidyverse.org) packages `dplyr` and `ggplot2` for data manipulation and plotting, respectively. We'll use the `gridExtra` to combine figures. We'll use the `sf` package for working with spatial data, and the `rnaturalearth` package to get example spatial data.
```{r, message = FALSE}
library(naturecounts)
library(dplyr)
library(ggplot2)
library(patchwork)
library(sf)
library(rnaturalearth)
```
If you have your own spatial data files that you would like to read into R, we recommend reading the [Reading, Writing and Converting Simple Features Vignette](https://r-spatial.github.io/sf/articles/sf2.html) from the [`sf` website](https://r-spatial.github.io/sf/).
First we'll get some spatial objects for our explorations from the `rnaturalearth` package. A map of Canada and a map of Ontario, both transformed from CRS 4326 (unprojected lat/lon) to 3347 (NAD83 Statistics Canada).
```{r}
na <- ne_countries(continent = "north america", returnclass = "sf") %>%
st_transform(3347)
canada <- ne_states(country = "canada", returnclass = "sf") %>%
st_transform(3347)
ontario <- ne_states(country = "Canada", returnclass = "sf") %>%
filter(name == "Ontario") %>%
st_transform(3347)
manitoba <- ne_states(country = "Canada", returnclass = "sf") %>%
filter(name == "Manitoba") %>%
st_transform(3347)
```
```{r}
ggplot() +
theme_bw() +
geom_sf(data = canada) + # Map of Canada
geom_sf(data = ontario, fill = "orange") # Map of Ontario
```
Alternatively specify the groups inside `ggplot()`
```{r}
ggplot() +
theme_bw() +
geom_sf(data = canada, aes(fill = name == "Ontario"), show.legend = FALSE)+
scale_fill_manual(values = c("grey90", "orange"))
```
## By Province/State
For example, if you wanted to collect all public observations in Winnipeg, Manitoba...
```{r winnipeg}
search_region("winnipeg", type = "subnational2")
obs <- nc_data_dl(region = list(subnational2 = "CA.MB.11"),
verbose = FALSE, username = "testuser", info = "nc_vignette")
```
Now we'll get a polygon representing Winnipeg, MB.
```{r}
winnipeg <- st_read("https://data.winnipeg.ca/api/geospatial/jx93-sett?method=export&format=GeoJSON") %>%
st_transform(3347)
obs_sf <- st_as_sf(obs, coords = c("longitude", "latitude"), crs = 4326) %>%
st_transform(3347)
ggplot() +
theme_bw() +
geom_sf(data = winnipeg) +
geom_sf(data = obs_sf)
```
Looks like our polygon of Winnipeg doesn't exactly match the subnational2 area. That's okay, we'll filter to match the polygon:
```{r}
obs_sf <- st_join(obs_sf, distinct(winnipeg), left = FALSE)
ggplot() +
theme_bw() +
geom_sf(data = winnipeg) +
geom_sf(data = obs_sf)
```
## By UTM Squares
In the following example, let's assume that you wish to concentrate only on observations from *urban* areas in Ontario, Canada.
We'll download that data with the `rnaturalearth` package and save it to the working directory (".")
```{r, eval = FALSE}
ne_download(scale = 10, type = "urban_areas", returnclass = "sf",
destdir = ".", load = FALSE)
```
```{r, echo = FALSE, eval = !file.exists(here::here("vignettes/region-spatial_files/ne_10m_urban_areas.shp"))}
# Run only if shape file not available
ne_download(scale = 10, type = "urban_areas", returnclass = "sf",
destdir = here::here("vignettes/region-spatial_files"), load = FALSE)
```
Now that we've saved it, we can load it for use.
```{r, eval = FALSE}
urban <- ne_load(scale = 10, type = "urban_areas", returnclass = "sf",
destdir = ".")
```
```{r, echo = FALSE}
urban <- ne_load(scale = 10, type = "urban_areas", returnclass = "sf",
destdir = here::here("vignettes/region-spatial_files"))
```
First we'll transform it to the 3347 CRS and clip the urban areas to match the borders of Ontario (`st_insection(spatial1, spatial2)`). Here we only care about the geometry of Ontario, not any other data (`st_geometry(ontario)`).
```{r}
urban_ontario <- urban %>%
st_transform(3347) %>%
st_intersection(st_geometry(ontario))
ggplot() +
theme_bw() +
geom_sf(data = ontario) +
geom_sf(data = urban_ontario, fill = "orange")
```
Now to filter your observations to urban areas, the first step would be to get all the UTM squares which overlap these areas. We can do this collecting the UTM squares with `meta_utm_squares()` and then filtering these to include only those that overlap these urban areas.
Once again, we only care about the actual geometries of `urban_ontario`, not about any of it's features, so we use `distinct()` to return distinct polygons without any other columns. Finally we use `unique()` at the end to remove any duplicates (i.e. `utm_squares` that overlap more than 1 urban polygon. Note that you must use `unique()` and not `distinct()` (which works on regular data frames, but here removes all data attributes).
```{r}
utm_on <- meta_utm_squares() %>%
filter(statprov_code == "ON") %>%
st_transform(3347) %>% # Transform to match urban CRS
st_join(., distinct(urban_ontario), left = FALSE) %>%
unique()
ggplot() +
theme_bw() +
geom_sf(data = ontario) +
geom_sf(data = utm_on, fill = "red") +
geom_sf(data = urban_ontario, fill = "blue", alpha = 0.5)
```
It's a bit tricky to see exactly what's going on, so let's zoom in a bit.
However, it's also a bit tricky to zoom when we're using a non-lat/lon CRS because the units are unintuitive. So let's specify the limits we want, then transform them in the CRS we're using and use that to set our limits.
```{r}
zoom <- data.frame(lon = c(-81, -78, -81, -78),
lat = c(43, 43, 45, 45)) %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
st_transform(3347) %>%
st_bbox()
zoom
```
For convenience, we'll use the patchwork package to combine the figures side-by-side so we can get a really good look at what we're doing.
```{r}
g1 <- ggplot() +
theme_bw() +
geom_sf(data = ontario) +
geom_sf(data = urban_ontario, fill = "blue", alpha = 0.5) +
coord_sf(xlim = zoom[c(1,3)], ylim = zoom[c(2,4)])
g2 <- ggplot() +
theme_bw() +
geom_sf(data = ontario) +
geom_sf(data = utm_on, fill = "red", alpha = 0.5) +
geom_sf(data = urban_ontario, fill = "blue", alpha = 0.5) +
coord_sf(xlim = zoom[c(1,3)], ylim = zoom[c(2,4)])
g1 + g2
```
So now we can see that the `utm_squares` we've selected overlap all our urban areas.
Now we can download the observations for all of these areas.
Because it can take time for the server to process so many `utm_squares`,
we'll increase the timeout to 5 minutes and will only download a couple of utm squares.
```{r}
obs <- nc_data_dl(region = list(utm_squares = utm_on$utm_square[1:10]),
verbose = FALSE, username = "testuser", timeout = 300,
info = "nc_vignette")
```
Finally, we do clip the resulting observations to the exact urban areas:
```{r}
obs_sf <- st_as_sf(obs, coords = c("longitude", "latitude"), crs = 4326) %>%
st_transform(crs = 3347) %>%
st_join(distinct(urban_ontario), left = FALSE)
ggplot() +
theme_bw() +
geom_sf(data = ontario) +
geom_sf(data = urban_ontario, fill = "blue", alpha = 0.5) +
geom_sf(data = obs_sf, size = 1) +
coord_sf(xlim = zoom[c(1,3)], ylim = zoom[c(2,4)])
```
However, sending large lists of UTM squares can be slow, so consider using a bounding box and filtering your downloaded data after the fact.
## By Bounding Box
In this example, we'll gather all observations for Jasper National Park in Alberta, Canada.
First we'll download and extract the shapefiles available from the [Alberta Parks](https://www.albertaparks.ca/albertaparksca/library/downloadable-data-sets/) website.
```{r, echo = FALSE, eval = !file.exists(here::here("vignettes/region-spatial_files/Parks_Protected_Areas_Alberta.shp"))}
# Run if don't have shapefile
download.file(url = "https://www.albertaparks.ca/media/2941843/parks_and_protected_areas_alberta.zip",
destfile = here::here("vignettes/region-spatial_files/alberta_parks.zip"))
unzip(here::here("vignettes/region-spatial_files/alberta_parks.zip"),
exdir = here::here("vignettes/region-spatial_files/"))
```
```{r, eval = FALSE}
url <- "https://www.albertaparks.ca/media/2941843/parks_and_protected_areas_alberta.zip"
download.file(url = url)
unzip("parks_and_protected_areas_alberta.zip")
parks <- st_read("Parks_Protected_Areas_Alberta.shp")
```
```{r echo = FALSE}
parks <- st_read(here::here("vignettes/region-spatial_files/Parks_Protected_Areas_Alberta.shp"))
```
Now we'll get a background map of Alberta and a spatial file of just Jasper.
```{r}
alberta <- ne_states(country = "Canada", returnclass = "sf") %>%
filter(name == "Alberta") %>%
st_transform(3347)
jasper <- filter(parks, NAME == "Jasper") %>%
st_transform(3347)
```
Let's see what that all looks like.
```{r}
ggplot() +
theme_bw() +
geom_sf(data = alberta) +
geom_sf(data = parks, aes(fill = NAME == "Jasper"), show.legend = FALSE) +
scale_fill_manual(values = c("lightyellow", "orange"))
```
Using a bounding box is a good way to download observations only from the Jasper area. But remember that the bounding box coordinates used by `nc_data_dl()` are in lat/lon, so we'll have to back transform.
```{r}
b <- jasper %>%
st_transform(4326) %>%
st_bbox()
b
```
We can give this directly to our `nc_data_dl()` function.
```{r}
obs <- nc_data_dl(region = list(bbox = b), verbose = FALSE, username = "testuser",
info = "nc_vignette")
```
Finally we clip the observations to the exact area of Jasper.
```{r}
obs_sf <- st_as_sf(obs, coords = c("longitude", "latitude"), crs = 4326) %>%
st_transform(3347) %>%
st_join(distinct(jasper), left = FALSE)
ggplot() +
theme_bw() +
geom_sf(data = jasper) +
geom_sf(data = obs_sf, size = 1, aes(colour = collection))
```