-
Notifications
You must be signed in to change notification settings - Fork 1
/
spocc_tutorial.Rmd
143 lines (96 loc) · 5.39 KB
/
spocc_tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
title: "`spocc` Package Tutorial"
output:
html_notebook: default
pdf_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Portions of the following tutorial are modified from rOpenSci's spocc tutorial: https://ropensci.org/tutorials/spocc_tutorial/.
## What is `spocc`?
`spocc`, standing for "species occurrence", is an R package that allows you to pull species occurrence data from a variety of online resources. We are going to use `spocc` to download occurrence data from GBIF specifically (although you can use it to download from other sources).
## Install `spocc`
Just like the other R packages we've worked with, you'll have to start off by installing `spocc`. There are two ways to do this: either go to your "Packages" tab in RStudio, click "Install", and type in "spocc", or run the following code:
```{r}
install.packages("spocc")
```
You only have to install a package once on your computer, but you have to load the package everytime you reopen RStudio:
```{r}
library(spocc)
```
## Retrieve data
The most important function in the package is the `occ` function.
**Try it yourself:** Use the "Help" menu to search for the `occ` function. What does the function do?
Now it's time to try running the function!
```{r}
#Running occ() for the Barbados threadsnake:
t_carlae_occ <- occ(query = "Tetracheilostoma carlae", from = 'gbif')
#Running occ() for your organism:
```
**Try it yourself:** Replace "Tetracheilostoma carlae" with the scientific name of Your Favorite Organism (YFO). Run `occ`.
Now we want to get the occurrence data into a dataframe format to get the information we need to make maps.
**Try it yourself:** Look up the `occ2df` function in "Help". Fill in the rest of the code below to make a dataframe of occurrences for your species:
```{r}
sp_df <- occ2df()
#View the dataframe you made
head(sp_df)
View(sp_df)
```
## Map data
Now that you've downloaded the data from GBIF (using `occ`), and formatted it a bit (using `occ2df`), you're ready to do some mapping. We're going to use the `ggmap` package again like we did with the GPS points you collected.
```{r}
#Load ggmap
library(ggmap)
#Set the API key again
# api_key =
register_google(key = api_key)
```
We're going to start by creating a data frame for a map of the entire world and then plotting it:
```{r}
#Make dataframe:
world_map <- map_data("world")
#Assign the world map plot to the variable "world"
world <- ggplot() +
geom_polygon(data = world_map, aes(x=long, y = lat, group = group), fill = "grey", color = "darkgrey")
#plot the world map
world
```
Now we want to add the occurrence data to our map. The following code looks a lot like the code we used to map your points in Central Park (and the alien sightings) -- the process of plotting species occurrence points is the same as the process to plot GPS points you took yourself.
```{r}
world +
geom_point(data = sp_df, aes(x = longitude, y = latitude),
color = "green",
size = 1)
```
Depending on what your species is, you will probably want to zoom in to a specific part of the globe. We can do this by making a bounding box again:
```{r}
#This time, we specify the bounding box using the latitude and longitude columns from our data
bound_box <- make_bbox(lon = sp_df$longitude, lat = sp_df$latitude, f = 2)
#Get a satellite map at the location of the bounding box you made:
bbox_map <- get_map(location = bound_box, maptype = "satellite", source = "google")
```
Now plot the map zoomed into the bounding box you created:
```{r}
ggmap(bbox_map) +
geom_point(data = sp_df, aes(x = longitude, y = latitude),
color = "red",
size =1)
```
**Try it yourself:** Open up a new R script file. In this script you're going to write your own code to get occurrence data for sloths! Use the code from this tutorial as a model, but modify it for your species of sloth. You can also customize the graphs as much as you want -- play with different parameters like color, extent of the map, etc.
When you save the file:
+ Save it in the intern_code folder
+ Name it in the following way (using the name of your sloth species and your own first name): `bradypus_tridactylus_occ_cecina`
## Bonus
### Occurrence date
Look at the `sp_df` dataframe for your sloth species. Notice we have a column for the date of the occurrences. Use your new `ggplot2` skills to color the points on your map according to the date of occurrence (look back at your UFO code if you want a hint).
```{r}
```
Is there any pattern on where the newest/occurrences were found? How might the date the occurrence was collected impact data quality?
### Use a different data source
The `spocc` package lets you get data from more sources than GBIF. You can also use BISON (https://bison.usgs.gov/#home), iNaturalist (https://www.inaturalist.org/), eBird (https://ebird.org/home), Ecoengine (https://ecoengine.berkeley.edu/), and VertNet (http://www.vertnet.org/index.html). Some of these sources only work for specific species, e.g. BISON is only for US species and eBird is only for birds. Pick a source that would work either for the sloths or for the species you originally picked at the beginning of the tutorial.
Rerun `occ` using the new source:
```{r}
spp_occ <- occ(query = "species name", from = "source")
```
Go through the steps in the rest of the tutorial to plot the occurrences from this data source. How does it compare to what you got from GBIF?