-
Notifications
You must be signed in to change notification settings - Fork 0
/
sure_example.Rmd
188 lines (159 loc) · 9.42 KB
/
sure_example.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
title: "HOSE Examples"
author: "David Gerard"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{HOSE Examples}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Abstract
This vignette will run through a couple examples on calculating higher-order spectral estimators (HOSE)
and using SURE as an estimator selection procedure.
## Generate Simulated Data.
First, we'll load the `hose` package and create some simulated data whose mean is actually low multilinear rank. `X` is a $10 \times 10 \times 10$ tensor that has multilinear rank $2 \times 2 \times 2$. We use the `holq` function from the `tensr` package to simulate a tensor whose first two singular values along each mode are approximately equal.
```{r}
library(tensr)
library(hose)
set.seed(328)
p <- c(10, 10, 10)
r <- c(2, 2, 2)
n <- length(p)
tau2 <- 1
fnorm_mean <- sqrt(prod(p)) * sqrt(tau2) / 2 ## so on same order as error
U_temp <- tensr::hosvd(array(stats::rnorm(prod(p)), dim = p))$U
U2_temp <- list()
S_temp <- tensr::holq(array(stats::rnorm(prod(p)), dim = r), print_diff = FALSE)$Z
for(index in 1:n) {
U2_temp[[index]] <- U_temp[[index]][ , 1:r[index]]
}
## Our mean tensor.
Theta <- tensr::atrans(S_temp, U2_temp)
Theta <- Theta / tensr::fnorm(Theta) * fnorm_mean
hosvd_Theta <- hose::hosvd_full(Theta)
## Our data tensor
X <- Theta + array(stats::rnorm(n = prod(p), sd = sqrt(tau2)), dim = p)
```
Some scree plots of the three matricizations of the data `X` do not seem to suggest that the mean tensor has low multilinear rank.
```{r}
## scree plots of the modes. Doesn't look low rank.
hosvd_x <- hosvd_full(X)
par(cex.lab = 0.7, cex.main = 0.7, cex.axis = 0.7, mgp = c(1.2, 0.4, 0),
mar = c(2.2, 2, 1.8, 0.5))
for(index in 1:n) {
plot(hosvd_x$D[[index]], type = "h", ylim = c(0, max(hosvd_x$D[[index]])),
xlab = "Singular Value", ylab = "Value",
main = paste0("Scree Plot for Mode ", index))
}
```
## Estimating the Variance.
All of these SURE-based mean estimators require that the variance be known. Since in most real-world applications, the variance is not known, I've provided a few methods to non-parametrically estimate the variance. These methods are implemented in the function `tensor_var_est`. The procedures available were all designed for matrix-variate data when the mean is low rank, so `tensor_var_est` applies these methods to each matricization of the data tensor and averages the resulting estimators. I'll describe two of these methods now.
The method that seems to work the best in general situations is that of Choi et al (2014). They use a degrees of freedom corrected residual mean-squared error of a soft-thresholding estimator of the mean where the tuning parameter is chosen by a procedure akin to cross-validation. Use the `sig_method = "soft"` option in `tensor_var_est` to use this method. Note that sometimes the `softImpute` function will not converge and return a warning. The main downside to this method is that it can be very slow.
```{r, warning=FALSE}
sig_soft_out <- tensor_var_est(X, sig_method = "soft")
print(sig_soft_out$sig2_est)
```
A method that works very well when the multilinear rank is very small and works very poorly otherwise is that of Gavish and Donoho (2014). Their estimator is based on an asymptotic framework where the rank of the mean and the signal to noise ratio remain constant while the size of the matrix increases. Since 2 (the rank) is much less than 10 (the dimension size), this procedure might work here. Use the option `sig_method = "mp"` in `tensor_var_est` to implement this method.
```{r}
sig_mp_out <- tensor_var_est(X, sig_method = "mp")
print(sig_mp_out$sig2_est)
```
We'll use the estimator from `sig_mp_out` as our variance in what follows.
```{r}
tau2_est <- sig_mp_out$sig2_est
```
## The Mode-specific Truncation Estimator.
First we'll look at mode-specific truncation as a multilinear rank selection procedure, running through each possible multilinear rank using the function `sure_rank`.
```{r}
sure_rank_out <- sure_rank(X, tau2 = tau2_est)
```
`sure_rank` returns the multilinear rank that minimizes the SURE.
```{r}
sure_rank_out$min_rank ## Estimated multilinear rank. Should be close to (2, 2, 2).
```
The profiled SURE for each mode demonstrates that $2 \times 2 \times 2$ is indeed the value that minimizes the SURE.
```{r}
par(cex.lab = 0.7, cex.main = 0.7, cex.axis = 0.7, mgp = c(1.2, 0.4, 0),
mar = c(2.2, 2, 1.8, 0.5), pch = ".")
marg_1 <- sure_rank_out$all_ranks[,2] == sure_rank_out$min_rank[2] &
sure_rank_out$all_ranks[,3] == sure_rank_out$min_rank[3]
plot(sure_rank_out$all_ranks[,1], sure_rank_out$sure,
xlab = "Mode 1 Rank", ylab = "SURE", main = "Mode 1 Rank vs SURE")
lines(sure_rank_out$all_ranks[marg_1,1], sure_rank_out$sure[marg_1],
lty = 2, col = 2, lwd = 2)
legend("topright", "Other Modes at Min Value", lty = 2, col = 2,
lwd = 2, cex = 0.7)
marg_2 <- sure_rank_out$all_ranks[,1] == sure_rank_out$min_rank[1] &
sure_rank_out$all_ranks[,3] == sure_rank_out$min_rank[3]
plot(sure_rank_out$all_ranks[,2], sure_rank_out$sure,
xlab = "Mode 2 Rank", ylab = "SURE", main = "Mode 2 Rank vs SURE")
lines(sure_rank_out$all_ranks[marg_2,2], sure_rank_out$sure[marg_2],
lty = 2, col = 2, lwd = 2)
legend("topright", "Other Modes at Min Value", lty = 2, col = 2,
lwd = 2, cex = 0.7)
marg_3 <- sure_rank_out$all_ranks[,1] == sure_rank_out$min_rank[1] &
sure_rank_out$all_ranks[,2] == sure_rank_out$min_rank[2]
plot(sure_rank_out$all_ranks[,3], sure_rank_out$sure,
xlab = "Mode 3 Rank", ylab = "SURE", main = "Mode 3 Rank vs SURE")
lines(sure_rank_out$all_ranks[marg_3,3], sure_rank_out$sure[marg_3],
lty = 2, col = 2, lwd = 2)
legend("topright", "Other Modes at Min Value", lty = 2, col = 2,
lwd = 2, cex = 0.7)
```
## Mode-specific Soft-thresholding Estimator and Competitors.
We'll switch gears now and fit a mode-specific soft-thresholding estimator. The optimal parameter values for the mode-specific soft-thresholding estimator can be found via `soft_coord`. The final mean estimate is `soft_coord$est`.
```{r}
c_obj <- get_c(X)
lambda_init <- rep(0, n)
c_init <- 1
soft_out <- soft_coord(c_obj, lambda_init = lambda_init, c_init = c_init,
use_sure = TRUE, print_iter = FALSE, tau2 = tau2_est, itermax = 20)
```
We'll compare the scree plots of the mean $\Theta$, the data $X$, and mode-specific soft-thresholding estimator.
```{r}
## scree plots of resulting estimator, X, and Theta
hosvd_soft <- hosvd_full(soft_out$est)
par(cex.lab = 0.7, cex.main = 0.7, cex.axis = 0.7, mgp = c(1.2, 0.4, 0),
mar = c(2.2, 2, 1.8, 0.5), pch = ".")
for(mode_index in 1:n) {
plot(hosvd_soft$D[[mode_index]], type = "h", xlab = "Singular Value",
ylab = "Value", main = paste0("Mode ", mode_index, " Scree Plot"),
ylim = c(0, max(c(hosvd_soft$D[[mode_index]],
hosvd_x$D[[mode_index]],
hosvd_Theta$D[[mode_index]]))))
lines(1:p[mode_index] + 0.2, hosvd_x$D[[mode_index]], type = "h", lty = 2)
lines(1:p[mode_index] + 0.4, hosvd_Theta$D[[mode_index]], type = "h",
lty = 3)
legend("topright", c("Soft", "X", "Theta"), lty = 1:3, cex = 0.7)
}
```
We'll compare our tensor estimator to a couple matrix and vector competitors. We'll look at the soft-thresholding estimator of Candes et al (2013), Stein's estimator (Stein, 1961), and a variant of the estimator of Efron and Morris (1972) that shrinks one mode, but chooses that mode by SURE.
```{r}
svd_mode1 <- svd(tensr::mat(X, 1))
mat_optim <- optim(1, fn = sure_matrix, d = svd_mode1$d,
p_dim = c(p[1], prod(p[-1])), tau2 = tau2_est, method = "Brent",
lower = 0, upper = svd_mode1$d[1])
mat_est <- array(svd_mode1$u %*%
diag(hose:::pos_part(svd_mode1$d - mat_optim$par)) %*%
t(svd_mode1$v), dim = p)
final_stein <- stein(X, tau2_est) ## Stein's estimator.
final_em <- min_ef(X, tau2_est) ## Min Efron-Morris estimator.
```
```{r}
cat(
" Losses:", "\n",
" Truncated HOSVD:", round(tensr::fnorm(sure_rank_out$est - Theta)), "\n",
"Tensor Soft Thresholding:", round(tensr::fnorm(soft_out$est - Theta)), "\n",
"Matrix Soft Thresholding:", round(tensr::fnorm(mat_est - Theta)), "\n",
" Efron-Morris:", round(tensr::fnorm(final_em$ef_est - Theta)), "\n",
" Stein:", round(tensr::fnorm(final_stein$est - Theta)), "\n",
" X:", round(tensr::fnorm(X - Theta)), "\n\n")
```
## References
Candes, E., Sing-Long, C. A., & Trzasko, J. D. (2013). Unbiased risk estimates for singular value thresholding and spectral estimators. *Signal Processing, IEEE Transactions on*, 61(19), 4643-4657.
Choi, Y., Taylor, J., & Tibshirani, R. (2014). Selecting the number of principal components: Estimation of the true rank of a noisy matrix. *arXiv preprint* arXiv:1410.8260.
Efron, B., & Morris, C. (1972). Empirical Bayes on vector observations: An extension of Stein's method. *Biometrika*, 59(2), 335-347.
Gavish, M., & Donoho, D. L. (2014). The optimal hard threshold for singular values is 4/sqrt(3). *Information Theory, IEEE Transactions on*, 60(8), 5040-5053.
Gerard, D., & Hoff, P. (2015). Adaptive Higher-order Spectral Estimators. *arXiv preprint* arXiv:1505.02114.
James, W., & Stein, C. (1961, June). Estimation with quadratic loss. *In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability* (Vol. 1, No. 1961, pp. 361-379).