/
ncdfgeom.Rmd
292 lines (239 loc) · 12.1 KB
/
ncdfgeom.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
---
title: "NetCDF-CF Geometry and Timeseries Tools for R"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{NetCDF-CF Geometry and Timeseries Tools for R}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup_1, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width=6,
fig.height=4
)
options(scipen = 9999)
```
## Introduction
`ncdfgeom` is intended to write spatial geometries, their attributes, and timeseries data (that would typically be stored in two or more files) into a single file. The package provides functions to read and write NetCDF-CF Discrete Sampling Geometries point and timeseries feature types as well as NetCDF-CF spatial geometries. These utilities are meant to be general, but were designed to support working with typical geospatial feature data with linked attributes and time series in NetCDF. Supported data types include:
- Variables from R `data.frame` tables with one row per geometry are read from or written to NetCDF variables.
- `data.frame` tables with a time series per column are read from or written as NetCDF-CF DSG timeSeries FeatureType data.
- `data.frame` table variables with a single-time-step observation for a given point location in each row are written to a NetCDF-CF DSG Point FeatureType.
- `sp` and `sf` spatial point, line, and polygon types can be read from or written to NetCDF-CF geometry variables introduced in CF-1.8
For timeseries, two formats are supported:
1. a wide format `data.frame` with timesteps in rows and geometry "instances" in columns with required attributes of geometry "instances" provided separately. This format lends its self to data where the same timestamps are used for every row and data exists for all geometry instances for all time steps -- it is sometimes referred to as the orthogonal array, or "space-wide", encoding.
1. a long format `data.frame` where each row contains all the geometry "instance" metadata, a time stamp, and the variables to be stored for that time step. This format lends it self to data where each geometry instance has unique timesteps and/or data is not available for each geometry instance at the same timesteps (sparse arrays).
Additional read / write functions to include additional DSG feature types will be implemented in the future and contributions are welcomed. `ncdfgeom` is a work in progress. Please review the ["issues"](https://github.com/DOI-USGS/ncdfgeom/issues) list to submit issues and/or see what changes are planned.
## Installation
At the time of writing, installation is only available via `remotes` or building the package directly as one would for development purposes.
```
install.packages("remotes")
remotes::install_github("DOI-USGS/ncdfgeom")
```
When on CRAN, the package will be installed with the typical install.packages method.
```
install.packages("ncdfgeom")
```
For this demo, we'll use `sf`, `dplyr`, and `ncdfgeom`.
```{r libs, message=FALSE, warning=FALSE}
library(sf)
library(dplyr)
library(ncdfgeom)
```
## Sample Data
We will start with two dataframes: **`prcp_data` and `climdiv_poly`** containing precipitation estimate timeseries and polygon geometries that describe the boundaries of climate divisions respectively. The precipitation data has one time series for each geometry.
Code to download and prep these data is shown at the bottom. More info about the data can be found at: [doi:10.7289/V5M32STR](https://doi.org/10.7289/V5M32STR)
The data required for this demo has been cached in the `ncdfgeom` package and is available as shown below.
### `prcp_data`
```{r data_shape_2}
prcp_data <- readRDS(system.file("extdata/climdiv-pcpndv.rds", package = "ncdfgeom"))
print(prcp_data, max_extra_cols = 0)
plot(prcp_data$date, prcp_data$`0101`, col = "red",
xlab = "date", ylab = "monthly precip (inches)", main = "Sample Timeseries for 0101-'Northern Valley'")
lines(prcp_data$date, prcp_data$`0101`)
```
### `climdiv_poly`
```{r data_shape}
climdiv_poly <- read_sf(system.file("extdata/climdiv.gpkg", package = "ncdfgeom"))
print(climdiv_poly)
plot(st_geometry(climdiv_poly), main = "Climate Divisions with 0101-'Northern Valley' Highlighted")
plot(st_geometry(filter(climdiv_poly, CLIMDIV == "0101")), col = "red", add = TRUE)
```
As shown above, we have two `data.frame`s. One has 344 columns and the other 344 rows. These 344 climate divisions will be our "instance" dimension when we write to NetCDF.
## Write Timeseries and Geometry to NetCDF
The NetCDF discrete sampling geometries timeseries standard requires point lat/lon coordinate locations for timeseries data. In the code below, we calculate these values and write the timeseries data to a netcdf file.
```{r write_ts, warning = FALSE}
climdiv_centroids <- climdiv_poly %>%
st_transform(5070) %>% # Albers Equal Area
st_set_agr("constant") %>%
st_centroid() %>%
st_transform(4269) %>% #NAD83 Lat/Lon
st_coordinates() %>%
as.data.frame()
nc_file <- "climdiv_prcp.nc"
prcp_dates <- prcp_data$date
prcp_data <- select(prcp_data, -date)
prcp_meta <- list(name = "climdiv_prcp_inches",
long_name = "Estimated Monthly Precipitation (Inches)")
write_timeseries_dsg(nc_file = nc_file,
instance_names = climdiv_poly$CLIMDIV,
lats = climdiv_centroids$Y,
lons = climdiv_centroids$X,
times = prcp_dates,
data = prcp_data,
data_unit = rep("inches", (ncol(prcp_data) - 1)),
data_prec = "float",
data_metadata = prcp_meta,
attributes = list(title = "Demonstation of ncdfgeom"),
add_to_existing = FALSE)
climdiv_poly <- st_sf(st_cast(climdiv_poly, "MULTIPOLYGON"))
write_geometry(nc_file = "climdiv_prcp.nc",
geom_data = climdiv_poly,
variables = "climdiv_prcp_inches")
```
Now we have a file with a structure as shown in the `ncdump` output below.
```{r ncdump}
try({ncdump <- system(paste("ncdump -h", nc_file), intern = TRUE)
cat(ncdump, sep = "\n")}, silent = TRUE)
```
For more information about the polygon and timeseries data structures used here, see the [NetCDF-CF standard.](http://cfconventions.org/cf-conventions/cf-conventions.html)
## Read Timeseries and Geometry from NetCDF
Now that we have all our data in a single file, we can read it back in.
```{r read}
# First read the timeseries.
prcp_data <- read_timeseries_dsg("climdiv_prcp.nc")
# Now read the geometry.
climdiv_poly <- read_geometry("climdiv_prcp.nc")
```
Here's what the data we read in looks like:
```{r data_summary}
names(prcp_data)
class(prcp_data$time)
names(prcp_data$varmeta$climdiv_prcp_inches)
prcp_data$data_unit
prcp_data$data_prec
str(names(prcp_data$data_frames$climdiv_prcp_inches))
prcp_data$global_attributes
names(climdiv_poly)
```
```{r p_colors_source, echo=FALSE}
# Because we've gotta have pretty colors!
p_colors <- function (n, name = c("precip_colors")) {
# Thanks! https://quantdev.ssri.psu.edu/tutorials/generating-custom-color-palette-function-r
p_rgb <- col2rgb(c("#FAFBF3", "#F0F8E3", "#D4E9CA",
"#BBE0CE", "#B7DAD0", "#B0CCD7",
"#A9B8D7", "#A297C2", "#8F6F9E",
"#684A77", "#41234D"))
precip_colors = rgb(p_rgb[1,],p_rgb[2,],p_rgb[3,],maxColorValue = 255)
name = match.arg(name)
orig = eval(parse(text = name))
rgb = t(col2rgb(orig))
temp = matrix(NA, ncol = 3, nrow = n)
x = seq(0, 1, , length(orig))
xg = seq(0, 1, , n)
for (k in 1:3) {
hold = spline(x, rgb[, k], n = n)$y
hold[hold < 0] = 0
hold[hold > 255] = 255
temp[, k] = round(hold)
}
palette = rgb(temp[, 1], temp[, 2], temp[, 3], maxColorValue = 255)
palette
}
```
To better understand what is actually in this file, let's visualize the data we just read. Below we join a sum of the precipitation timeseries to the climate division polygons and plot them up.
```{r plot, fig.height=6, fig.width=8}
climdiv_poly <- climdiv_poly %>%
st_transform(3857) %>% # web mercator
st_simplify(dTolerance = 5000)
title <- paste0("\n Sum of: ", prcp_data$varmeta$climdiv_prcp_inches$long_name, "\n",
format(prcp_data$time[1],
"%Y-%m", tz = "UTC"), " - ",
format(prcp_data$time[length(prcp_data$time)],
"%Y-%m", tz = "UTC"))
prcp_sum <- apply(prcp_data$data_frames$climdiv_prcp_inches,
2, sum, na.rm = TRUE)
prcp <- data.frame(CLIMDIV = names(prcp_sum),
prcp = as.numeric(prcp_sum),
stringsAsFactors = FALSE) %>%
right_join(climdiv_poly, by = "CLIMDIV") %>%
st_as_sf()
plot(prcp["prcp"], lwd = 0.1, pal = p_colors,
breaks = seq(0, 14000, 1000),
main = title,
key.pos = 3, key.length = lcm(20))
```
This is the code used to download and prep the precipitation and spatial data. Provided for reproducibility and is not run here.
```{r setup_dontrun, eval = FALSE}
# Description here: ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdiv/divisional-readme.txt
prcp_url <- "ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdiv/climdiv-pcpndv-v1.0.0-20190408"
prcp_file <- "prcp.txt"
download.file(url = prcp_url, destfile = prcp_file, quiet = TRUE)
division_url <- "ftp://ftp.ncdc.noaa.gov/pub/data/cirs/climdiv/CONUS_CLIMATE_DIVISIONS.shp.zip"
division_file <- "CONUS_CLIMATE_DIVISIONS.shp.zip"
download.file(url = division_url, destfile = division_file, quiet = TRUE)
unzip("CONUS_CLIMATE_DIVISIONS.shp.zip")
climdiv_poly <- read_sf("GIS.OFFICIAL_CLIM_DIVISIONS.shp") %>%
select("CLIMDIV", CLIMDIV_NAME = "NAME") %>%
mutate(CLIMDIV = ifelse(nchar(as.character(CLIMDIV)) == 3,
paste0("0",as.character(CLIMDIV)),
as.character(CLIMDIV))) %>%
st_simplify(dTolerance = 0.0125)
month_strings <- c("jan", "feb", "mar", "apr", "may", "jun",
"jul", "aug", "sep", "oct", "nov", "dec")
prcp_data <- read.table(prcp_file, header = FALSE,
colClasses = c("character",
rep("numeric", 12)),
col.names = c("meta", month_strings))
# Here we gather the data into a long format and prep it for ncdfgeom.
prcp_data <- prcp_data %>%
gather(key = "month", value = "precip_inches", -meta) %>%
mutate(climdiv = paste0(substr(meta, 1, 2), substr(meta, 3, 4)),
year = substr(meta, 7, 10),
precip_inches = ifelse(precip_inches < 0, NA, precip_inches)) %>%
mutate(date = as.Date(paste0("01", "-", month, "-", year),
format = "%d-%b-%Y")) %>%
select(-meta, -month, -year) %>%
filter(climdiv %in% climdiv_poly$CLIMDIV) %>%
spread(key = "climdiv", value = "precip_inches") %>%
filter(!is.na(`0101`))
as_tibble()
# Now make sure things are in the same order.
climdiv_names <- names(prcp_data)[2:length(names(prcp_data))]
climdiv_row_order <- match(climdiv_names, climdiv_poly$CLIMDIV)
climdiv_poly <- climdiv_poly[climdiv_row_order, ]
sf::write_sf(climdiv_poly, "climdiv.gpkg")
saveRDS(prcp_data, "climdiv-pcpndv.rds")
unlink("GIS*")
unlink("CONUS_CLIMATE_DIVISIONS.shp.zip")
unlink("prcp.txt")
```
Here's the `p_colors` function used in plotting above.
```{r p_colors, eval=FALSE}
p_colors <- function (n, name = c("precip_colors")) {
# Thanks! https://quantdev.ssri.psu.edu/tutorials/generating-custom-color-palette-function-r
p_rgb <- col2rgb(c("#FAFBF3", "#F0F8E3", "#D4E9CA",
"#BBE0CE", "#B7DAD0", "#B0CCD7",
"#A9B8D7", "#A297C2", "#8F6F9E",
"#684A77", "#41234D"))
precip_colors = rgb(p_rgb[1,],p_rgb[2,],p_rgb[3,],maxColorValue = 255)
name = match.arg(name)
orig = eval(parse(text = name))
rgb = t(col2rgb(orig))
temp = matrix(NA, ncol = 3, nrow = n)
x = seq(0, 1, , length(orig))
xg = seq(0, 1, , n)
for (k in 1:3) {
hold = spline(x, rgb[, k], n = n)$y
hold[hold < 0] = 0
hold[hold > 255] = 255
temp[, k] = round(hold)
}
palette = rgb(temp[, 1], temp[, 2], temp[, 3], maxColorValue = 255)
palette
}
```
```{r cleanup, echo=FALSE}
unlink("climdiv_prcp.nc")
```