Skip to content

Commit

Permalink
Episode 10 updated by replacing calls to raster and rgdal packages wi…
Browse files Browse the repository at this point in the history
…th calls to terra package. datacarpentry#368 datacarpentry#363
  • Loading branch information
albhasan committed Mar 16, 2023
1 parent 231af26 commit 93540d1
Showing 1 changed file with 71 additions and 53 deletions.
124 changes: 71 additions & 53 deletions episodes/10-vector-csv-to-shapefile-in-r.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ source("setup.R")
::::::::::::::::::::::::::::::::::::::::::::::::::

```{r load-libraries, echo=FALSE, results="hide", message=FALSE, warning=FALSE}
library(raster)
library(rgdal)
library(terra)
library(ggplot2)
library(dplyr)
library(sf)
Expand All @@ -43,13 +42,17 @@ point_HARV <- st_read("data/NEON-DS-Site-Layout-Files/HARV/HARVtower_UTM18N.shp"

## Things You'll Need To Complete This Episode

See the [lesson homepage](.) for detailed information about the software,
data, and other prerequisites you will need to work through the examples in this episode.
See the [lesson homepage](.) for detailed information about the software, data,
and other prerequisites you will need to work through the examples in this
episode.


::::::::::::::::::::::::::::::::::::::::::::::::::

This episode will review how to import spatial points stored in `.csv` (Comma Separated Value) format into R as an `sf` spatial object. We will also reproject data imported from a shapefile format, export this data as a shapefile, and plot raster and vector data as layers in the same plot.
This episode will review how to import spatial points stored in `.csv` (Comma
Separated Value) format into R as an `sf` spatial object. We will also
reproject data imported from a shapefile format, export this data as a
shapefile, and plot raster and vector data as layers in the same plot.

## Spatial Data in Text Format

Expand All @@ -65,17 +68,18 @@ We would like to:

Spatial data are sometimes stored in a text file format (`.txt` or `.csv`). If
the text file has an associated `x` and `y` location column, then we can
convert it into an `sf` spatial object. The `sf` object allows us to store both the `x,y` values that represent the coordinate location
of each point and the associated attribute data - or columns describing each
feature in the spatial object.
convert it into an `sf` spatial object. The `sf` object allows us to store both
the `x,y` values that represent the coordinate location of each point and the
associated attribute data - or columns describing each feature in the spatial
object.

We will continue using the `sf` and `raster` packages in this episode.

## Import .csv

To begin let's import a `.csv` file that contains plot coordinate `x, y`
locations at the NEON Harvard Forest Field Site (`HARV_PlotLocations.csv`) and look at the structure of
that new object:
To begin let's import a `.csv` file that contains plot coordinate `x, y`
locations at the NEON Harvard Forest Field Site (`HARV_PlotLocations.csv`) and
look at the structure of that new object:

```{r read-csv}
plot_locations_HARV <-
Expand All @@ -84,7 +88,11 @@ plot_locations_HARV <-
str(plot_locations_HARV)
```

We now have a data frame that contains 21 locations (rows) and 16 variables (attributes). Note that all of our character data was imported into R as factor (categorical) data. Next, let's explore the dataframe to determine whether it contains columns with coordinate values. If we are lucky, our `.csv` will contain columns labeled:
We now have a data frame that contains 21 locations (rows) and 16 variables
(attributes). Note that all of our character data was imported into R as
character (text) data. Next, let's explore the dataframe to determine whether
it contains columns with coordinate values. If we are lucky, our `.csv` will
contain columns labeled:

- "X" and "Y" OR
- Latitude and Longitude OR
Expand All @@ -98,18 +106,19 @@ names(plot_locations_HARV)

## Identify X,Y Location Columns

Our column names include several fields that might contain spatial information. The `plot_locations_HARV$easting`
and `plot_locations_HARV$northing` columns contain coordinate values. We can confirm
this by looking at the first six rows of our data.
Our column names include several fields that might contain spatial information.
The `plot_locations_HARV$easting` and `plot_locations_HARV$northing` columns
contain coordinate values. We can confirm this by looking at the first six rows
of our data.

```{r check-out-coordinates}
head(plot_locations_HARV$easting)
head(plot_locations_HARV$northing)
```

We have coordinate values in our data frame. In order to convert our
data frame to an `sf` object, we also need to know the CRS
associated with those coordinate values.
We have coordinate values in our data frame. In order to convert our data frame
to an `sf` object, we also need to know the CRS associated with those
coordinate values.

There are several ways to figure out the CRS of spatial data in text format.

Expand All @@ -119,8 +128,8 @@ There are several ways to figure out the CRS of spatial data in text format.
file header or somewhere in the data columns.

Following the `easting` and `northing` columns, there is a `geodeticDa` and a
`utmZone` column. These appear to contain CRS information
(`datum` and `projection`). Let's view those next.
`utmZone` column. These appear to contain CRS information (`datum` and
`projection`). Let's view those next.

```{r view-CRS-info}
head(plot_locations_HARV$geodeticDa)
Expand All @@ -140,23 +149,27 @@ we learned about the components of a `proj4` string. We have everything we need
to assign a CRS to our data frame.

To create the `proj4` associated with UTM Zone 18 WGS84 we can look up the
projection on the [Spatial Reference website](https://www.spatialreference.org/ref/epsg/wgs-84-utm-zone-18n/), which contains a list of CRS formats for each projection. From here, we can extract the [proj4 string for UTM Zone 18N WGS84](https://www.spatialreference.org/ref/epsg/wgs-84-utm-zone-18n/proj4/).
projection on the
[Spatial Reference website](https://www.spatialreference.org/ref/epsg/wgs-84-utm-zone-18n/),
which contains a list of CRS formats for each projection. From here, we can
extract the
[proj4 string for UTM Zone 18N WGS84](https://www.spatialreference.org/ref/epsg/wgs-84-utm-zone-18n/proj4/).

However, if we have other data in the UTM Zone 18N projection, it's much
easier to use the `st_crs()` function to extract the CRS in `proj4` format from that object and
assign it to our
new spatial object. We've seen this CRS before with our Harvard Forest study site (`point_HARV`).
However, if we have other data in the UTM Zone 18N projection, it's much easier
to use the `st_crs()` function to extract the CRS in `proj4` format from that
object and assign it to our new spatial object. We've seen this CRS before with
our Harvard Forest study site (`point_HARV`).

```{r explore-units}
st_crs(point_HARV)
```

The output above shows that the points shapefile is in
UTM zone 18N. We can thus use the CRS from that spatial object to convert our
non-spatial dataframe into an `sf` object.
The output above shows that the points shapefile is in UTM zone 18N. We can
thus use the CRS from that spatial object to convert our non-spatial dataframe
into an `sf` object.

Next, let's create a `crs` object that we can use to define the CRS of our
`sf` object when we create it.
Next, let's create a `crs` object that we can use to define the CRS of our `sf`
object when we create it.

```{r crs-object}
utm18nCRS <- st_crs(point_HARV)
Expand All @@ -167,16 +180,18 @@ class(utm18nCRS)

## .csv to sf object

Next, let's convert our dataframe into an `sf` object. To do
this, we need to specify:
Next, let's convert our dataframe into an `sf` object. To do this, we need to
specify:

1. The columns containing X (`easting`) and Y (`northing`) coordinate values
2. The CRS that the column coordinate represent (units are included in the CRS) - stored in our `utmCRS` object.

We will use the `st_as_sf()` function to perform the conversion.

```{r convert-csv-shapefile}
plot_locations_sp_HARV <- st_as_sf(plot_locations_HARV, coords = c("easting", "northing"), crs = utm18nCRS)
plot_locations_sp_HARV <- st_as_sf(plot_locations_HARV,
coords = c("easting", "northing"),
crs = utm18nCRS)
```

We should double check the CRS to make sure it is correct.
Expand All @@ -200,8 +215,9 @@ ggplot() +
In
[Open and Plot Shapefiles in R](06-vector-open-shapefile-in-r/)
we learned about spatial object extent. When we plot several spatial layers in
R using `ggplot`, all of the layers of the plot are considered in setting the boundaries
of the plot. To show this, let's plot our `aoi_boundary_HARV` object with our vegetation plots.
R using `ggplot`, all of the layers of the plot are considered in setting the
boundaries of the plot. To show this, let's plot our `aoi_boundary_HARV` object
with our vegetation plots.

```{r plot-data}
ggplot() +
Expand All @@ -210,8 +226,8 @@ ggplot() +
ggtitle("AOI Boundary Plot")
```

When we plot the two layers together, `ggplot` sets the plot boundaries
so that they are large enough to include all of the data included in all of the layers.
When we plot the two layers together, `ggplot` sets the plot boundaries so that
they are large enough to include all of the data included in all of the layers.
That's really handy!

::::::::::::::::::::::::::::::::::::::: challenge
Expand All @@ -223,7 +239,8 @@ locations.

Import the .csv: `HARV/HARV_2NewPhenPlots.csv` into R and do the following:

1. Find the X and Y coordinate locations. Which value is X and which value is Y?
1. Find the X and Y coordinate locations. Which value is X and which value is
Y?
2. These data were collected in a geographic coordinate system (WGS84). Convert
the dataframe into an `sf` object.
3. Plot the new points with the plot location points from above. Be sure to add
Expand All @@ -236,8 +253,7 @@ If you have extra time, feel free to add roads and other layers to your map!
## Answers

1)
First we will read in the new csv file and look at the
data structure.
First we will read in the new csv file and look at the data structure.

```{r}
newplot_locations_HARV <-
Expand All @@ -246,21 +262,22 @@ str(newplot_locations_HARV)
```

2)
The US boundary data we worked with previously is in a geographic
WGS84 CRS. We can use that data to establish a CRS for this data. First
we will extract the CRS from the `country_boundary_US` object and
confirm that it is WGS84.
The US boundary data we worked with previously is in a geographic WGS84 CRS. We
can use that data to establish a CRS for this data. First we will extract the
CRS from the `country_boundary_US` object and confirm that it is WGS84.

```{r}
geogCRS <- st_crs(country_boundary_US)
geogCRS
```

Then we will convert our new data to a spatial dataframe, using
the `geogCRS` object as our CRS.
Then we will convert our new data to a spatial dataframe, using the `geogCRS`
object as our CRS.

```{r}
newPlot.Sp.HARV <- st_as_sf(newplot_locations_HARV, coords = c("decimalLon", "decimalLat"), crs = geogCRS)
newPlot.Sp.HARV <- st_as_sf(newplot_locations_HARV,
coords = c("decimalLon", "decimalLat"),
crs = geogCRS)
```

Next we'll confirm that the CRS for our new object is correct.
Expand All @@ -269,9 +286,9 @@ Next we'll confirm that the CRS for our new object is correct.
st_crs(newPlot.Sp.HARV)
```

We will be adding these new data points to the plot we
created before. The data for the earlier plot was in UTM.
Since we're using `ggplot`, it will reproject the data for us.
We will be adding these new data points to the plot we created before. The data
for the earlier plot was in UTM. Since we're using `ggplot`, it will reproject
the data for us.

3) Now we can create our plot.

Expand All @@ -292,8 +309,8 @@ We can write an R spatial object to a shapefile using the `st_write` function
in `sf`. To do this we need the following arguments:

- the name of the spatial object (`plot_locations_sp_HARV`)
- the directory where we want to save our shapefile
(to use `current = getwd()` or you can specify a different path)
- the directory where we want to save our shapefile (to use `current = getwd()`
or you can specify a different path)
- the name of the new shapefile (`PlotLocations_HARV`)
- the driver which specifies the file format (ESRI Shapefile)

Expand All @@ -308,7 +325,8 @@ st_write(plot_locations_sp_HARV,

:::::::::::::::::::::::::::::::::::::::: keypoints

- Know the projection (if any) of your point data prior to converting to a spatial object.
- Know the projection (if any) of your point data prior to converting to a
spatial object.
- Convert a data frame to an `sf` object using the `st_as_sf()` function.
- Export an `sf` object as text using the `st_write()` function.

Expand Down

0 comments on commit 93540d1

Please sign in to comment.