-
Notifications
You must be signed in to change notification settings - Fork 2
/
dynamic_factor_analysis.Rmd
250 lines (201 loc) · 9.58 KB
/
dynamic_factor_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
---
title: "Dynamic factor analysis"
author: "James Thorson"
output: rmarkdown::html_vignette
#output: rmarkdown::pdf_document
vignette: >
%\VignetteIndexEntry{Dynamic factor analysis}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
# Install locally
# devtools::install_local( R'(C:\Users\James.Thorson\Desktop\Git\dsem)', force=TRUE )
# Build
# setwd(R'(C:\Users\James.Thorson\Desktop\Git\dsem)'); devtools::build_rmd("vignettes/dynamic_factor_analysis.Rmd")
```
## Dynamic factor analysis
`dsem` is an R package for fitting dynamic structural equation models (DSEMs) with a simple user-interface and generic specification of simultaneous and lagged effects in a potentially recursive structure. Here, we highlight how DSEM can be used to implement dynamic factor analysis (DFA). We specifically replicate analysis using the Multivariate Autoregressive State-Space (MARSS) package, using data that are provided as an example in the MARSS package.
```{r setup, echo=TRUE, message=FALSE}
library(dsem)
library(MARSS)
library(ggplot2)
data( harborSealWA, package="MARSS")
# Define helper function
grab = \(x,name) x[which(names(x)==name)]
# Define number of factors
# n_factors >= 3 doesn't seem to converge using DSEM or MARSS without penalties
n_factors = 2
```
## Using MARSS
We first illustrate a DFA model using two factors, fitted using MARSS:
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7}
# Load data
dat <- t(scale(harborSealWA[,c("SJI","EBays","SJF","PSnd","HC")]))
# DFA with 3 states; used BFGS because it fits much faster for this model
fit_MARSS <- MARSS( dat,
model = list(m=n_factors),
form="dfa",
method="BFGS",
silent = TRUE )
```
We can then plot the estimated factors (latent variables):
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=3.5}
# Plots states using all data
plot(fit_MARSS, plot.type="xtT")
```
And the estimated predictor for measurements (manifest variables):
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7}
# Plot expectation for data using all data
plot(fit_MARSS, plot.type="fitted.ytT")
```
## Full-rank covariance using DSEM
In DSEM syntax, we can first fit a saturated (full-covariance) model using the argument `covs`:
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7, warning=FALSE}
# Add factors to data
tsdata = ts( cbind(harborSealWA[,c("SJI","EBays","SJF","PSnd","HC")]), start=1978)
# Scale and center (matches with MARSS does when fitting a DFA)
tsdata = scale( tsdata, center=TRUE, scale=TRUE )
# Define SEM
sem = "
# Random-walk process for variables
SJF -> SJF, 1, NA, 1
SJI -> SJI, 1, NA, 1
EBays -> EBays, 1, NA, 1
PSnd -> PSnd, 1, NA, 1
HC -> HC, 1, NA, 1
"
# Initial fit
mydsem0 = dsem( tsdata = tsdata,
covs = c("SJF, SJI, EBays, PSnd, HC"),
sem = sem,
family = rep("normal", 5),
control = dsem_control( quiet = TRUE,
run_model = FALSE ) )
# fix all measurement errors at diagonal and equal
map = mydsem0$tmb_inputs$map
map$lnsigma_j = factor( rep(1,ncol(tsdata)) )
#
mydsem_full = dsem( tsdata = tsdata,
covs = c("SJF, SJI, EBays, PSnd, HC"),
sem = sem,
family = rep("normal", 5),
control = dsem_control( quiet = TRUE,
map = map ) )
```
We can then define a custom function to plot states:
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7}
plot_states = function( out,
vars=1:ncol(out$tmb_inputs$data$y_tj) ){
#
xhat_tj = as.list(out$sdrep,report=TRUE,what="Estimate")$z_tj[,vars,drop=FALSE]
xse_tj = as.list(out$sdrep,report=TRUE,what="Std. Error")$z_tj[,vars,drop=FALSE]
#
longform = expand.grid( Year=time(tsdata), Var=colnames(tsdata)[vars] )
longform$est = as.vector(xhat_tj)
longform$se = as.vector(xse_tj)
longform$upper = longform$est + 1.96*longform$se
longform$lower = longform$est - 1.96*longform$se
longform$data = as.vector(tsdata[,vars,drop=FALSE])
#
ggplot(data=longform) + #, aes(x=interaction(var,eq), y=Estimate, color=method)) +
geom_line( aes(x=Year,y=est) ) +
geom_point( aes(x=Year,y=data), color="blue", na.rm=TRUE ) +
geom_ribbon( aes(ymax=as.numeric(upper),ymin=as.numeric(lower), x=Year), color="grey", alpha=0.2 ) +
facet_wrap( facets=vars(Var), scales="free", ncol=2 )
}
plot_states( mydsem_full )
```
These estimated states follow the data more closely and have smaller estimated confidence intervals. Presumably this occurs because we are using a full-rank covariance so far.
## Reduced-rank factor model with measurement errors
Next, we can specify two factors factors while eliminating additional process error and estimating measurement errors. This requires us to switch to `gmrf_parameterization = "projection"`, so that we can fit a rank-deficient Gaussian Markov random field:
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7}
# Add factors to data
tsdata = harborSealWA[,c("SJI","EBays","SJF","PSnd","HC")]
newcols = array( NA,
dim = c(nrow(tsdata),n_factors),
dimnames = list(NULL,paste0("F",seq_len(n_factors))) )
tsdata = ts( cbind(tsdata, newcols), start=1978)
# Scale and center (matches with MARSS does when fitting a DFA)
tsdata = scale( tsdata, center=TRUE, scale=TRUE )
#
sem = make_dfa( variables = c("SJI","EBays","SJF","PSnd","HC"),
n_factors = n_factors )
# Initial fit
mydsem0 = dsem( tsdata = tsdata,
sem = sem,
family = c( rep("normal",5), rep("fixed",n_factors) ),
estimate_delta0 = TRUE,
control = dsem_control( quiet = TRUE,
run_model = FALSE,
gmrf_parameterization = "projection" ) )
# fix all measurement errors at diagonal and equal
map = mydsem0$tmb_inputs$map
map$lnsigma_j = factor( rep(1,ncol(tsdata)) )
# Fix factors to have initial value, and variables to not
map$delta0_j = factor( c(rep(NA,ncol(harborSealWA)-1), 1:n_factors) )
# Fix variables to have no stationary mean except what's predicted by initial value
map$mu_j = factor( rep(NA,ncol(tsdata)) )
# profile "delta0_j" to match MARSS (which treats initial condition as unpenalized random effect)
mydfa = dsem( tsdata = tsdata,
sem = sem,
family = c( rep("normal",5), rep("fixed",n_factors) ),
estimate_delta0 = TRUE,
control = dsem_control( quiet = TRUE,
map = map,
use_REML = TRUE,
#profile = "delta0_j",
gmrf_parameterization = "projection" ) )
```
We again plot the estimated latent variables
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=3.5}
# Plot estimated factors
plot_states( mydfa, vars=5+seq_len(n_factors) )
```
and the estimated predictor for manifest variables
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7}
# Plot estimated variables
plot_states( mydfa, vars=1:5 )
```
This results in similar (but not identical) factor values using MARSS and DSEM. In particular, DSEM has higher variance in early years. This likely arises because the default MARSS implementation of DFA includes a penalty of the initial state $\mathbf{x}_0$ with mean zero and variance of $5\mathbf{I}$. This term presumably provides additional information about the initial year such that MARSS DFA results are not invariant to reversing the order of the data.
To further explore, we can modify the MARSS DFA to eliminate the prior on initial conditions, based on help from Dr. Eli Holmes. This involves specifying:
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7}
# Extract internal settings
modmats <- summary(fit_MARSS$model, silent=TRUE)
# Redefine defaults
modmats$V0 <- matrix(0, n_factors, n_factors )
modmats$x0 <- "unequal"
# Refit
fit_MARSS2 = MARSS( dat,
model = modmats,
silent = TRUE,
control = list( abstol = 0.001,
conv.test.slope.tol = 0.01,
maxit = 1000 ))
```
These have estimated time-series that are more similar to those from DSEM
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=3.5}
# Plots states using all data
plot(fit_MARSS2, plot.type="xtT")
```
We can now compare the three options in terms of the fitted log-likelihood:
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7}
# Compare likelihood for MARSS and DSEM
Table = c( "MARSS" = logLik(fit_MARSS),
"DSEM" = logLik(mydfa),
"MARSS_no_pen" = logLik(fit_MARSS2) )
knitr::kable( Table, digits=3)
```
which confirms that the MARSS model without a penalty on initial conditions results in the same likelihood as DSEM. Finally, we can also compare the three options in terms of estimated loadings:
```{r, echo=TRUE, message=FALSE, fig.width=7, fig.height=7}
Table = cbind( "MARSS" = as.vector(fit_MARSS$par$Z),
"DSEM" = grab(mydfa$opt$par,"beta_z"),
"MARSS_no_pen" = as.vector(fit_MARSS2$par$Z) )
rownames(Table) = names(fit_MARSS$coef)[1:nrow(Table)]
knitr::kable( Table, digits=3)
```
The estimating loadings are similar using DSEM and the MARSS model without initial penalty, except with label switching (where some factors and loadings can be multiplied by -1 with no change in the model):