-
Notifications
You must be signed in to change notification settings - Fork 0
/
replicate.Rmd
272 lines (226 loc) · 7.93 KB
/
replicate.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
---
title: "Ngme2 replicates feature"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Ngme2 replicates feature}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.align = "center",
fig.width=7, #not sure about width and height
fig.height=7
)
```
In this vignette we will introduce the feature of replicates, which allows to include multiple realizations of the
stochastic processes that can be included in the different models offered in `ngme2`.
## Description
Consider a stochastic process $\mathbf{W}$ indexed by an index set $\mathbf{I}$, the idea is to allow the inclusion of replicates of the process
in the model.
For example, if one wants to include k realizations of an autoregressive process of order 1 with $n$ observations in the model one can do so
in the following way
\begin{align}
X^{(j)}_1 &= \epsilon^{(j)}_1\\
X^{(j)}_i &= \alpha X^{(j)}_{i-1} + \epsilon^{(j)}_i, i = 2, \ \cdots,\ n
\end{align}
where $X^{(j)}_i$ denotes the $i \text{-}th$ observation of $j \text{-}th$ realization of the process for $i = 1, \ \cdots,\ n$ and $j = 1, \ \cdots,\ k$, with
$|\alpha| < 1$ and $\epsilon^{(j)}_1, \ \cdots \ , \epsilon^{(j)}_{n}$ is either i.i.d. **NIG** or **Gaussian** noise.
## Usage
### AR(1) model
The following example will show how to use the replicates feature for an AR(1) model.
Suppose one has a response variable with $2n$ observations
```{r}
library(ngme2)
library(ggplot2)
# creating ar1 process
n <- 500
myar <- ngme2::f(1:n, model="ar1", rho = .75, noise = noise_nig())
W1 <- simulate(myar, seed = 314)[[1]] # simulating one realization of the process
W2 <- simulate(myar, seed = 159)[[1]] # simulating another realization of the process
# Creating the response variable from the process and adding measurement noise
Y <- -3 + c(W1, W2) + rnorm(2 * n, sd = .2)
df <- data.frame(Y, idx = 1:(2 * n), group = rep(1:2, each = n)) # creating the data frame to fit the model later
```
```{r echo = F}
ggplot(df, aes(idx, Y)) +
geom_path(aes(color = as.factor(group))) +
labs(
x = "t", title = "Realization of the process",
subtitle = "(side by side)",
color = "Realization"
) +
scale_fill_manual(aesthetics = "color", values = c("black", "red")) +
theme_bw() +
theme(plot.title = element_text(hjust = .5))
```
Note that if both processes are considered as a single one instead of two realizations of the same process, then the first observation of the second realization is assumed to be
dependent on the last one of the first process, which might no be desirable in some cases.
```{r echo = F}
ggplot(df[1:n, ], aes(idx, Y)) +
geom_path(aes(linetype = "1", color = "1")) +
geom_path(
data = cbind.data.frame(idx = df[1:n, "idx"], Y = df[-(1:n), c("Y")]),
mapping = aes(idx, Y, linetype = "2", color = "2")
) +
labs(
x = "t", title = "Same process viewed as replicates",
linetype = "Realization", colour = "Realization"
) +
scale_color_manual(values = c("black", "red")) +
theme_bw() +
theme(plot.title = element_text(hjust = .5))
```
This can be seen in the $\mathbf{K}$ matrix of the model (all the models in the have the same form $K \mathbf{W} = \mathbf{\
epsilon}$), for instance, consider the following little example
to illustrate the last statement.
```{r}
no.replicates <- ngme2::f(1:6, model="ar1", rho = .5, noise = noise_normal())
replicates <- ngme2::f(rep(1:3, 2),
model = "ar1",
rho = .5,
noise = noise_normal(), replicate = rep(1:2, each = 3)
)
# K matrix from the model without replicates
replicates$operator$K
# K matrix from the model with replicates
no.replicates$operator$K
```
It is clear how the link between the last observation of the first process dissapears by looking at the $\mathbf{K}_{4, 3}$ entry of the matrix $\mathbf{K}$.
With that being said, one can fit the model using the `ngme` function as shown below.
```{r}
# fitting the model
mod.replicates <- ngme(
formula = Y ~ f(
c(1:n, 1:n),
model = "ar1",
noise = noise_nig()
),
replicate = df$group,
data = df,
control_opt = control_opt(
print_check_info = FALSE
)
)
mod.replicates
```
### SPDE Matern model
The following data is taken from the swamp of Cienaga Grande in Santa Marta, Colombia. There is a total of 114 locations where some properties of the
swamp were measured. Those measurements were taken twice, however there is no information available about their chronological order so this data cannot
be treated as spatiotemporal, despite that, the multiple measurements can be treated as replicates.
In this particular case the temperature is the feature that will be modeled.
```{r}
# library(ngme2)
library(ggplot2)
# reading the data and the boundary of Cienaga Grande
data(cienaga)
data(cienaga.border)
# scale the coords
cienaga.border[, 1] <- (cienaga.border[, 1] - mean(cienaga$East)) / sd(cienaga$East)
cienaga.border[, 2] <- (cienaga.border[, 2] - mean(cienaga$North)) / sd(cienaga$North)
cienaga <- within(cienaga, {
East_scale <- (East - mean(East)) / sd(East)
North_scale <- (North - mean(North)) / sd(North)
})
# creating label for the measurement group
cienaga$measurement <- rep(1:2, each = (n <- nrow(cienaga) / 2))
```
```{r echo=F}
ggplot(data = cienaga[cienaga$measurement == 1, ]) +
geom_point(aes(East_scale, North_scale,color = temp)) +
geom_path(data = cienaga.border, aes(East, North)) +
labs(
color = "Temperature [°C]",
title = "Temperature in Cienaga Grande",
subtitle = "First measurement"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = .5),
plot.subtitle = element_text(hjust = .5)
) +
viridis::scale_color_viridis()
ggplot(data = cienaga[cienaga$measurement == 1, ]) +
geom_point(aes(East_scale, North_scale,color = temp)) +
geom_path(data = cienaga.border, aes(East, North)) +
labs(
color = "Temperature [°C]",
title = "Temperature in Cienaga Grande",
subtitle = "second measurement"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = .5),
plot.subtitle = element_text(hjust = .5)
) +
viridis::scale_color_viridis()
```
Now we briefly show how to fit the model and how to predict for locations where the feature is unknown.
```{r}
# creating the mesh
mesh <- fmesher::fm_mesh_2d(
loc.domain = cienaga.border,
max.edge = c(0.4, 1),
max.n = 500
)
mesh$n
```
```{r echo=F}
plot(mesh)
lines(cienaga.border, col = "blue", lwd = 2)
points(cienaga[, c("East_scale", "North_scale")], pch = 20, col = "red", cex = 2)
# cienaga
cienaga$temp <- scale(cienaga$temp)
```
```{r}
# fitting the model
fit <- ngme(
formula = temp ~ 1 +
f(as.matrix(cienaga[, 1:2]), model = "matern", mesh=mesh,
name = "spde", noise = noise_nig()),
data = cienaga,
replicate=cienaga$measurement,
control_opt = control_opt(
estimation = T,
iterations = 500,
n_slope_check = 10,
n_parallel_chain = 4,
print_check_info = F
),
debug = F,
)
fit
```
With the fitted model, one can do predictions as follows.
```{r}
nxy <- c(300, 200)
projgrid <- rSPDE::rspde.mesh.projector(
mesh = mesh,
xlim = range(cienaga.border[, 1]), ylim = range(cienaga.border[, 2]),
dims = nxy
)
xy.in <- splancs::inout(projgrid$lattice$loc, as.matrix(cienaga.border[, 1:2]))
coord.prd <- projgrid$lattice$loc[xy.in, ]
lp <- predict(fit, map = list(spde=coord.prd)) # making the predictions
# getting the mean of the predictions for each replicate
preds_mean <- lp[["mean"]]
```
```{r echo=F}
ggplot() +
geom_point(aes(coord.prd[, 1], coord.prd[, 2], color = preds_mean)) +
# geom_point(data = cienaga, mapping = aes(East_scale, North_scale), size = 2) +
geom_path(aes(cienaga.border[, 1], cienaga.border[, 2])) +
viridis::scale_color_viridis() +
theme_minimal() +
labs(
x = "East", y = "North",
title = "Temperature predictions in Cienaga Grande",
color = "Temperature"
) +
theme(
plot.title = element_text(hjust = .5),
plot.subtitle = element_text(hjust = .5)
)
```