In [None]:
2 Preliminaries
Some of the code in this tutorial takes considerable time to run, in these cases
precomputed results have been included in the package as data-files. The
tutorial marks time consuming code with the following warning/alternative
statements:
WARNING: The following steps are time-consuming.
> Some time consuming code
ALTERNATIVE: Load pre-computed results.
> An option to load precomputed results.
End of alternative
Here we will study NOx data from Los Angeles. The data are described
in subsection 1.2 and consist of 25 different monitor locations, with 2-week
average log NOx concentrations measured for 280 2-week periods.
4
2.1 The STdata object
First load the package, along with a few additional packages need by the
tutorial:
> library(SpatioTemporal)
> library(plotrix)
> library(maps)
2.1 The STdata object
The basic S3-object in this package, collecting covariates and observations,
is a an STdata-object. In the following an STdata-object will be created
from the data, thereafter the structure and the components of the object are
described.

In [None]:

2.1.1 Creating an STdata object from raw data
The data used in this example are contained in data(mesa.data.raw), which
we load and examine.
> data(mesa.data.raw, package="SpatioTemporal")
> str(mesa.data.raw,1)
List of 3
$ X :'data.frame': 25 obs. of 12 variables:
$ obs : num [1:280, 1:25] 4.58 3.89 4.01 4.08 3.73 ...
..- attr(*, "dimnames")=List of 2
$ lax.conc.1500: num [1:280, 1:25] 2.32 1.84 1.49 2.59 1.9 ...
..- attr(*, "dimnames")=List of 2
As we can see mesa.data.raw consists of a list with two matrices and one
data.frame; these contain the observations ("obs"), geographic covariates
("X") and spatio-temporal covariates ("lax.conc.1500") of the example.
We will use the createSTdata() function to create the STdata object. The
createSTdata() function requires (at least) two arguments: obs and covars.
Spatio-temporal covariates can be supplied through the optional argument
SpatioTemporal. An example of possible input for the covars argument is
given by the X data frame of mesa.data.raw:
> head(mesa.data.raw$X)
5
Tutorial for SpatioTemporal
ID x y long lat type
1 60370002 -10861.67 3793.589 -117.923 34.1365 AQS
2 60370016 -10854.95 3794.456 -117.850 34.1443 AQS
3 60370030 -10888.66 3782.332 -118.216 34.0352 AQS
4 60370031 -10891.42 3754.649 -118.246 33.7861 AQS
5 60370113 -10910.76 3784.099 -118.456 34.0511 AQS
6 60371002 -10897.96 3797.979 -118.317 34.1760 AQS
log10.m.to.a1 log10.m.to.a2 log10.m.to.a3 log10.m.to.road
1 2.861509 4.100755 2.494956 2.494956
2 3.461672 3.801059 2.471498 2.471498
3 2.561133 3.695772 1.830197 1.830197
4 3.111413 2.737527 2.451927 2.451927
5 2.762193 3.687412 2.382281 2.382281
6 2.760931 4.035977 1.808260 1.808260
km.to.coast s2000.pop.div.10000
1 15.000000 1.733283
2 15.000000 1.645386
3 15.000000 6.192630
4 1.023311 2.088930
5 6.011075 7.143731
6 15.000000 4.766780

In [None]:


Above we can see an excerpt of mesa.data.raw$X. In this example,
mesa.data.raw$X contains information about the monitoring locations, including:
names (or ID’s), x- and y-coordinates, covariates from a GIS to be
used in the LUR, monitor type, longitudes and latitudes. The covars argument
of createSTdata() should, at a minimum, include coordinates and
covariates for all locations. Observations are matched to the locations by
matching the columnames of obs (see below) to 1) names given by a ID field
in covars; 2) the rownames of covars; 3) names infered from the ordering
of covars, see stCheckCovars.
Next, examine the $obs part of the raw data.
> mesa.data.raw$obs[1:6,1:5]
60370002 60370016 60370030 60370031 60370113
1999-01-13 4.577684 4.131632 NA NA 4.727882
1999-01-27 3.889091 3.543566 NA NA 4.139332
1999-02-10 4.013020 3.632424 NA NA 4.054051
1999-02-24 4.080691 3.842586 NA NA 4.392799
6
2.1 The STdata object
1999-03-10 3.728085 3.396944 NA NA 3.960577
1999-03-24 3.751913 3.626161 NA NA 3.958741
In this example the observations are stored as a (number of time-points)-
by-(number of locations) matrix with missing observations denoted by NA,
the row- and columnames identify the location and time point of each observation.
Alternatively, one could have the observations as a data frame
with three fields: date, ID and obs. The format of mesa.data.raw$obs as
a matrix is most convenient for data with few (or no) missing observations.
The final element is a spatio-temporal covariate, i.e. the output from the
Caline3QHC model (see subsubsection 1.2.2),
> mesa.data.raw$lax.conc.1500[1:6,1:5]
60370002 60370016 60370030 60370031 60370113
1999-01-13 2.3188 0 8.0641 0.1467 2.9894
1999-01-27 1.8371 0 7.3568 0.2397 4.7381
1999-02-10 1.4886 0 6.3673 0.2463 4.3922
1999-02-24 2.5868 0 7.1783 0.1140 3.3456
1999-03-10 1.8996 0 6.3159 0.1537 3.8495
1999-03-24 2.0162 0 6.3277 0.1906 3.2170

In [None]:

This matrix contains spatio-temporal covariate values for all locations and
times. Similar to the mesa.data.raw$obs matrix, the row- and column
names of the mesa.data.raw$lax.conc.1500 matrix contain the dates and
location ID’s of the spatio-temporal covariate.
The measurement locations, LUR information, observations and spatio-temporal
covariates (optional) above constitute the basic raw data needed by
the createSTdata() function. Given these minimal elements, creation of
the STdata structure is easy:
> ##matrix of observations
> obs <- mesa.data.raw$obs
> ##data.frame/matrix of covariates
> covars <- mesa.data.raw$X
> ##list/3D-array with the spatio-temporal covariates
> ST.list <- list(lax.conc.1500=mesa.data.raw$lax.conc.1500)
> ##create STdata object
> mesa.data <- createSTdata(obs, covars, SpatioTemporal=ST.list,
n.basis=2)
7
Tutorial for SpatioTemporal
A few things to note here: we must first convert the
mesa.data.raw$lax.conc.1500 spatio-temporal covariate matrix to a list
(or 3D-array); the length of this list equals the number of spatio-temporal
covariates we want to use (in this case, just 1). We also specified n.basis=2,
which indicates we want to compute 2 temporal trends; for a discussion on
how to determine suitable temporal trends (or basis functions) see Section 4.3
in vignette("ST_intro", package="SpatioTemporal").
The resulting STdata-object contains a number of elements, described in the
following Sections (2.1.2–2.1.6).
> names(mesa.data)
[1] "obs" "covars" "SpatioTemporal"
[4] "trend" "trend.fnc"
2.1.2 The mesa.data$covars Data Frame

In [None]:

We begin our examination of the data by investigating mesa.data$covars:
> head(mesa.data$covars)
ID x y long lat type
1 60370002 -10861.67 3793.589 -117.923 34.1365 AQS
2 60370016 -10854.95 3794.456 -117.850 34.1443 AQS
3 60370030 -10888.66 3782.332 -118.216 34.0352 AQS
4 60370031 -10891.42 3754.649 -118.246 33.7861 AQS
5 60370113 -10910.76 3784.099 -118.456 34.0511 AQS
6 60371002 -10897.96 3797.979 -118.317 34.1760 AQS
log10.m.to.a1 log10.m.to.a2 log10.m.to.a3 log10.m.to.road
1 2.861509 4.100755 2.494956 2.494956
2 3.461672 3.801059 2.471498 2.471498
3 2.561133 3.695772 1.830197 1.830197
4 3.111413 2.737527 2.451927 2.451927
5 2.762193 3.687412 2.382281 2.382281
6 2.760931 4.035977 1.808260 1.808260
km.to.coast s2000.pop.div.10000
1 15.000000 1.733283
2 15.000000 1.645386
3 15.000000 6.192630
4 1.023311 2.088930
8
2.1 The STdata object
5 6.011075 7.143731
6 15.000000 4.766780
The covars data frame is a 25 × 12 data frame. The first field contains the
ID, or names, for each of the 25 locations, this is the only mandatory field
in covars and will be added by createSTdata if missing; the second and
third fields contain x- and y-coordinates, which are used to calculate distances
between locations. The following fields contain longitude and latitude
coordinates; a field describing the type of monitoring system to which each
location belongs; and LUR covariates. In this example, the LUR covariates
are log10 meters to A1, A2, A3 roads and the minimum of these three measurements;
kilometres to the coast; and average population density in a 2 km
buffer (divided by 10,000).
In addition to the ID-field the type-field is also special; when it exists it is
used to seperate different types of observtions locations (used by e.g. the
summary and plot functions). If included, this field should contain factors
or strings. In this example, we have two types: AQS refers to the EPA’s
regulatory monitors that are part of the Air Quality System, while FIXED
refers to the MESA Air locations.

In [None]:

Although we have observations at all the locations in this example, one could
also include locations in mesa.data$covars that do not have observations in
order to predict at those locations (see Appendix A for a prediction example).
The following code plots these locations on a map, shown in Figure 1.
> ###Plot the locations, see Figure 1
> par(mfrow=c(1,1))
> plot(mesa.data$covars$long, mesa.data$covars$lat,
pch=24, bg=c("red","blue")[mesa.data$covars$type],
xlab="Longitude", ylab="Latitude")
> ###Add the map of LA
> map("county", "california", col="#FFFF0055", fill=TRUE,
add=TRUE)
> ##Add a legend
> legend("bottomleft", c("AQS","FIXED"), pch=24, bty="n",
pt.bg=c("red","blue"))
2.1.3 The mesa.data$trend Data Frame
Next, look at mesa.data$trend and mesa.data$trend.fnc:
9
Tutorial for SpatioTemporal
−118.4 −118.2 −118.0 −117.8
33.7 33.8 33.9 34.0 34.1 34.2
Longitude
Latitude
AQS
FIXED
Figure 1: Location of monitors in the Los Angeles area.
> head(mesa.data$trend)
V1 V2 date
1 -1.8591693 1.20721096 1999-01-13
2 -1.5200057 0.90473775 1999-01-27
3 -1.1880840 0.62679098 1999-02-10
4 -0.8639833 0.38411634 1999-02-24
5 -0.5536476 0.19683161 1999-03-10
6 -0.2643623 0.08739755 1999-03-24
> head(mesa.data$trend.fnc)
1 function (x = date.ind)
2 {
3 X.comps <- matrix(NA, length(x), length(spline))

In [None]:

10
2.1 The STdata object
4 for (i in 1:length(spline)) {
5 X.comps[, i] <- scale(predict(spline[[i]], as.double(x))$y,
6 center = scale.spline[[i]][1], scale = scale.spline[[i]][2])
The trend data frame consists of 2 smooth temporal basis functions computed
using singular value decomposition (SVD). These temporal trends corresponds
to the fi(t):s in (2). The spatio-temporal model also includes an
intercept, i.e. a vector of 1’s; the intercept is added automatically and should
not be included in trend. Additionaly the functions used to compute the
smooth trends are stored in trend.fnc and can be used to compute temporal
trends at additional time-points, for observed time points trend.fnc
returns elements in trend.
> cbind(mesa.data$trend.fnc(mesa.data$trend$date[1:5]),
mesa.data$trend[1:5,])
V1 V2 V1 V2
1999-01-13 -1.8591693 1.2072110 -1.8591693 1.2072110
1999-01-27 -1.5200057 0.9047378 -1.5200057 0.9047378
1999-02-10 -1.1880840 0.6267910 -1.1880840 0.6267910
1999-02-24 -0.8639833 0.3841163 -0.8639833 0.3841163
1999-03-10 -0.5536476 0.1968316 -0.5536476 0.1968316
date
1999-01-13 1999-01-13
1999-01-27 1999-01-27
1999-02-10 1999-02-10
1999-02-24 1999-02-24
1999-03-10 1999-03-10
The mesa.data$trend data frame is 280 × 3, where 280 is the number of
time points for which we have NOx concentration measurements. Here,
the first two columns contain smooth temporal trends, and the last column
contains dates in the R date format. In general, one of the columns in
mesa.data$trend must be called date and have dates in the R date format;
the names of the other columns are arbitrary. Studying the date component,
> range(mesa.data$trend$date)
[1] "1999-01-13" "2009-09-23"
we see that measurements are made over a period of about 10 years, from
January 13, 1999 until September 23, 2009.
11
Tutorial for SpatioTemporal

In [None]:


2.1.4 The mesa.data$obs Data Frame
The observations are stored in mesa.data$obs:
> head(mesa.data$obs)
obs date ID
1 4.577684 1999-01-13 60370002
2 3.889091 1999-01-27 60370002
3 4.013020 1999-02-10 60370002
4 4.080691 1999-02-24 60370002
5 3.728085 1999-03-10 60370002
6 3.751913 1999-03-24 60370002
The data frame, mesa.data$obs, consists of observations, over time, for each
of the 25 locations. The data.frame contains three variables: obs — the
measured log NOx concentrations; date — the date of each observation; and
ID — labels indicating at which monitoring location each measurement was
taken. Details regarding the monitoring can be found in Cohen et al. (2009),
and a brief introduction is given in subsection 1.2.
The ID values should correspond to the ID of the monitoring locations given
in mesa.data$covars$ID. The dates in mesa.data$obs should correspond
to dates in mesa.data$trend$date; although as for mesa.data$covars$ID
additional, unobserved dates, are allowed in mesa.data$trend$date.
Note that the number of rows in mesa.data$obs is 4577, far fewer than the
280 × 25 = 7000 observations there would be if each location had a complete
time series of observations.
2.1.5 The mesa.data$SpatioTemporal Array
Finally, examine the mesa.data$SpatioTemporal data:
> dim(mesa.data$SpatioTemp)
[1] 280 25 1
> mesa.data$SpatioTemp[1:5,1:5,,drop=FALSE]
, , lax.conc.1500
60370002 60370016 60370030 60370031 60370113
12
2.1 The STdata object
1999-01-13 2.3188 0 8.0641 0.1467 2.9894
1999-01-27 1.8371 0 7.3568 0.2397 4.7381
1999-02-10 1.4886 0 6.3673 0.2463 4.3922
1999-02-24 2.5868 0 7.1783 0.1140 3.3456
1999-03-10 1.8996 0 6.3159 0.1537 3.8495
The mesa.data$SpatioTemp element should be a three dimensional array
containing spatio-temporal covariates. In this example dataset we have only
one covariate, which is the output from the Caline3QHC model, see subsection
1.2. If no spatio-temporal covariates are used mesa.data$SpatioTemp
should be set to NULL.
Of the three dimensions of mesa.data$SpatioTemp, the first (280) refers to
the number of time points where we have spatio-temporal covariate measurements,
the second (25) refers to the number of locations, and the third
(1) refers to the number of different spatio-temporal covariates. Though the
entire array is not shown here, it should be noted that values of the spatio-temporal
covariate are specified for all 280-by-25 space-time locations.
Again, this array could contain values of the spatio-temporal covariate(s) at
times and/or locations that do not have observations, in order to predict at
those times/locations.
The dimnames of the SpatioTemp array are used to match covariates with
observations, locations, and time-points
> str(dimnames(mesa.data$SpatioTemp))
List of 3
$ : chr [1:280] "1999-01-13" "1999-01-27" "1999-02-10" "1999-02-24" ...
$ : chr [1:25] "60370002" "60370016" "60370030" "60370031" ...
$ : chr "lax.conc.1500"
The rownames should match the dates of observations and the temporal
trends, i.e. they should be given by
> as.character(sort(unique(c(mesa.data$obs$date,
mesa.data$trend$date))))
the column names should match the location ID’s in
mesa.data$covars$ID, and the names of the third dimension
> dimnames(mesa.data$SpatioTemp)[[3]]
13
Tutorial for SpatioTemporal
[1] "lax.conc.1500"
identifies the different spatio-temporal covariates.
2.1.6 Summaries of mesa.data
Now that we have gone over a detailed description of what is in the mesa.data
object, we can use the following function to examine a summary of the observations:
> print(mesa.data)
STdata-object with:
No. locations: 25 (observed: 25)
No. time points: 280 (observed: 280)
No. obs: 4577
Trend with 2 basis function(s):
[1] "V1" "V2"
with dates:
1999-01-13 to 2009-09-23
12 covariate(s):
[1] "ID" "x"
[3] "y" "long"
[5] "lat" "type"
[7] "log10.m.to.a1" "log10.m.to.a2"
[9] "log10.m.to.a3" "log10.m.to.road"
[11] "km.to.coast" "s2000.pop.div.10000"
1 spatio-temporal covariate(s):
[1] "lax.conc.1500"
All sites:
AQS FIXED
20 5
Observed:
AQS FIXED
20 5
For AQS:
14
3. createSTmodel(): Specifying the
Spatio-Temporal model
Number of obs: 4178
Dates: 1999-01-13 to 2009-09-23
For FIXED:
Number of obs: 399
Dates: 2005-12-07 to 2009-07-01
Here we can see the number of AQS and FIXED locations in the mesa.data
structure. There are 20 AQS locations, which correspond to the number of
locations marked as AQS in mesa.data$covars$type, and 5 FIXED locations,
which correspond to the locations flagged as FIXED in
mesa.data$covars$type. We can also see that the observations are made
over the same range of time as the temporal trends; this is appropriate, as
discussed above. The summary also indicates the total number of locations
(and time points) as well as how many of these that have been observed, Nbr
locations: 25 (observed: 25). In this example all of our locations have
been observed; Appendix A provides an example with unobserved locations.
To graphically depict where and when our observation occurred we plot the
monitor locations in time and space.
> ###Plot when observations occurr, see Figure 2
> par(mfcol=c(1,1), mar=c(4.3,4.3,1,1))
> plot(mesa.data, "loc")
From Figure 2 we see that the MESA monitors only sampled during the
second half of the period. We also note that the number of observations vary
greatly between different locations.