-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R6 PCA wrappers #42
R6 PCA wrappers #42
Conversation
R/moma_expose.R
Outdated
|
||
# check if Omega is a matrix | ||
if (!is.matrix(Omega) && !is.null(Omega)) { | ||
moma_error("Omage_u/v is not a matrix.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omega not Omage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
R/moma_expose.R
Outdated
moma_error("Omage_u/v is not a matrix.") | ||
} | ||
|
||
# store them as sparse matrices using the package Matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using sparse matrices here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is a TODO. See the new commit "Add TODO".
Before we can do a full review, all the R stuff needs at least basic documentation and to be added to the NAMESPACE using |
Yes, documentations are key to the package. However, I thought it would be our focus in the third coding phase so I did not write docs for these wrappers. Anyway, for the sake of this review, I added some basic documentations (they are far from the final version though). Also there are some use cases in tests suites. These should be enough for reviewing this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool design! Let's iterate it on a bit, but I like where it's going quite a lot!
@@ -72,29 +77,42 @@ Rcpp::List MoMA::criterion_search(const arma::vec &bic_au_grid, | |||
// We conduct 2 BIC searches over 2D grids here instead | |||
// of 4 searches over 1D grids. It's consistent with | |||
// Genevera's code. | |||
while (tol > EPS_bic && iter < max_bic_iter) | |||
if (n_au > 1 || n_av > 1 || n_lu > 1 || n_lv > 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to change this? I don't see how this additional loop connects to the PCA wrappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 'bic_au_grid' and so forth are all scalars instead of vectors, then there is no need for the searching process. So we jump directly to evaluating the final result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose - flip side being, if we are searching over a grid of length 1, the overhead is minimal, so is it worth the additional case in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - the overhead is negligible. But let's keep it.
R/moma_sfpca.R
Outdated
u_sparsity = empty(), v_sparsity = empty(), lambda_u = 0, lambda_v = 0, # lambda_u/_v is a vector or scalar | ||
Omega_u = NULL, Omega_v = NULL, alpha_u = 0, alpha_v = 0, # so is alpha_u/_v | ||
pg_setting = moma_pg_settings(), | ||
selection_scheme_str = "gggg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky for users. In general, we shouldn't be asking folks to remember the order of the 4 internal parameters.
What do you think about having a flag for selection inside the penalty objects (default TRUE)?
u_sparsity = lasso(select = FALSE) # GRID
u_sparsity = lasso(select = TRUE) # or maybe "BIC" does BIC selection
That seems to make intent clearer, but I don't know how nasty it would make the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Omega_u/_v
, the smoothing matrices? How to add the select
argument to it?
Besides, we will supplement with detailed documentations. So it shouldn't be very troublesome for users. I am thinking about adding a special doc for selection_scheme_str
and explaining the selection procedure here, so that users can find out about it by typing ? selection_scheme_str
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work of changing to this design mainly lies in updating tests cases, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a similar sort of thing:
u_smoothness = second_order_difference(select = TRUE)
I used a similar pattern for fusion weights in the clustRviz package: it's a bit tricky to get your head around it, but check it out - might have some useful ideas.
Related, but separate, question: if we go this route, should parameter values be in the object or as a second argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem of using u_smoothness = second_order_difference(select = TRUE)
is that we have to provide a set of smoothness argument functions, just like what we did for lasso(), scad()
etc. However this comes at the cost of limiting users' choice of smoothing terms.
The reason why I did not make those argument functions in the first place is that the choice for Omega can be very flexible, e.g., smoothing matrices in generalized additive models.
As for the second question, I think we should absorb parameter values into helper functions like lasso()
, if we decide to do this. This would make our function look really nice though:
a <- moma_sfpca(X,
v_sparsity = lasso(lambda=seq(0,4), select = "bic"),
v_smoothness = sec_diff_mat(alpha=seq(0,2), select = "grid"))
I opened an issue to track this. #43
R/moma_sfpca.R
Outdated
selection_criterion_lambda_v = 0 | ||
) | ||
for (i in 1:4) { | ||
selection_scheme_list[[i]] <- ifelse(strsplit(selection_scheme_str, split = "")[[1]][i] == "g", 0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The substr
function might simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See commit "strsplit -> substr".
R/moma_sfpca.R
Outdated
self$selection_scheme_list <- selection_scheme_list | ||
|
||
# Step 1.7: check rank | ||
if (!inherits(rank, "numeric") || rank <= 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check for "integerish" here - see the is.wholenumber example function in ?is.integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. See commit "Add is.wholenumber(rank) and tests".
R/moma_sfpca.R
Outdated
) | ||
}, | ||
|
||
get_mat = function(alpha_u = 1, alpha_v = 1, lambda_u = 1, lambda_v = 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the dots and require arguments by name her trick? Positional arguments seem too finicky and easy to get in the wrong order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. See commit "get_mat -> get_mat_by_id, and dots".
for (i in (1:self$rank)) { | ||
U[, i] <- | ||
get_5Dlist_elem(self$grid_result, | ||
alpha_u_i = alpha_u, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh - I really don't like specifying these as indices rather than values. How much work do you think #40 would be to get in place before merging this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What interpolation scheme do you have in mind? I am thinking about the 4-D version of bilinear interpolation. This can be done using the get_mat
, which retrieves results using position indices.
R/moma_sfpca.R
Outdated
list(coln, paste0("PC", seq_len(rank))) | ||
dimnames(U) <- | ||
list(rown, paste0("PC", seq_len(rank))) | ||
return(list(U = U, V = V)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function called get_mat
? Is it for users to use? (I'm not super familiar with R6 visibility rules.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short for "get the matrices U and V".
Yes, I am thinking exposing it. The user gets an SFPCA object by calling moma_spca, moma_fpca
etc., and call its member functions like get_mat
, left_project
and so on to play with it.
I think if we expose the R6 object all of its public methods get exposed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation to expose it (by index) as opposed to by value?
R/moma_sfpca.R
Outdated
# newX should be uncencter and unscaled. | ||
# check new X has same colnames | ||
if (length(dim(newX)) != 2L) { | ||
moma_error("'newX' must be a matrix or data frame") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to support data frames in these functions. Things get super weird with non-numeric values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of supporting data frames is to make use of column names and row names. See here:
For example, the loadings (a matrix packed by left singular vectors, of dimension p x k), should have row names the same as column names of the data matrix. See here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But matrices can have both column and rownames too. (Against data frames, the tidyverse crowd is pushing back on having rownames for tibbles (their data frame subclass) so that's a bigger issue.)
For me, the bigger quesetion, however is what happens with non-numeric elements of data frames? Are we going to do a factor encoding automatically or just error out? Requiring a matrix avoids this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as.matrix
is.finite
deals with non-numeric elements.
https://github.com/Banana1530/MoMA-1/blob/ef9bb20acd71b5c43eed13d70d1a0fecf70037e7/R/moma_sfpca.R#L108
I also added a test in commit "Add test "SFPCA object: X contains string""
test_that("SFPCA object: X contains string", {
set.seed(12)
SFPCA$debug("initialize")
X <- matrix(seq(1:12), 4, 3)
X[1, 1] <- "abc"
expect_error(SFPCA$new(X), "X must not have NaN, NA, or Inf.")
})
R/moma_sfpca.R
Outdated
moma_error("`newX` is incompatible with orignal data.") | ||
} | ||
|
||
PV <- solve((t(V) %*% V), t(V)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t(V) %%*% V
is just crossprod(V)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes. I always forgot to use these little helpers.
Fixed. See commit "t(V) %%*% V -> crossprod(V)".
} | ||
} | ||
else | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else
statement here keeps the format of u/v_result
consistent in both cases. They should be vectors containing Rcpp::Named("lambda"), Rcpp::Named("alpha"), Rcpp::Named("vector"), Rcpp::Named("bic")
.
#' @export | ||
SFPCA <- R6::R6Class("SFPCA", | ||
private = list( | ||
check_input_index = TRUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check_input_index
works as a toggle. When interpolate
calls private_get_mat_by_index
, we turn off input checking to make things easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this: I don't need that we need a fast path, but if we do, maybe it's easier to have a (public) function which does input checking and then passes along to a (private) function that doesn't input check but does the actual work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for a "fast path" is that what parameters can be specified in the function get_mat_by_id
is dependent on how SFPCA is initialized. For example,
a1 <- moma_sfpca(X)
a1$get_mat_by_id() # OK
a1$get_mat_by_id(alpha_u = 1) # not allowed, it is redundant
a1 <- moma_sfpca(X, lambda_u=seq(0,10), u_sparsity=lasso())
a1$get_mat_by_id() # error! lambda_u not specified
a1$get_mat_by_id(lambda_u = 2) # OK
a1$get_mat_by_id(lambda_u = 2, alpha_u = 1) # error too, alpha_u is redundant
So calling get_mat_by_id
internally will be a pain. This is why we have the toggle check_input_index
. When get_mat_by_id
is called by internal function, it is set to FALSE. See:
R/moma_sfpca.R
Outdated
|
||
if (private$check_input_index) { | ||
missing_list <- list(missing(alpha_u), missing(alpha_v), missing(lambda_u), missing(lambda_v)) | ||
if (any(self$fixed_list == TRUE & missing_list != TRUE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Fixed" parameters are those
i) that are chosen by BIC, or ,
ii) that are not specified during initialization of the SFCPA object, or
iii) that are scalars during initialization of the SFCPA object.
For example,
a <- moma_sfpca(X, lambda_u = 1.2, lambda_v = 1:10, v_sparsity = u_sparsity = lasso())
Then a$get_mat_by_id(lambda_v = 3)
is valid, while a$get_mat_by_id(lambda_v = 3 , lambda_u = 1)
is redundant and should give a warning (actually an error in code). a$get_mat_by_id(lambda_v = 3 , lambda_u = 2)
is wrong because the lambda_u grid is of length 1.
Another example:
# lambda_v is selected by BIC
a <- moma_sfpca(X, lambda_v = 1:10, v_sparsity =lasso(), selecthion_scheme_str = "ggbg")
Then a$get_mat_by_id(lambda_v = 1)
is redundant and will give an error in current design.
Another example:
a <- moma_sfpca(X, lambda_v = 1, lambda_u = 2, alpha_v =3, alpha_u = 4, lasso()....)
Then a$get_mat_by_id(alpha_u=1,alpha_v=1,lambda_u=1,lambda_v=1)
will give an error, though it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable.
} | ||
|
||
# a bool: want to interpolate on u side? | ||
inter_u <- !missing(alpha_u) && !missing(lambda_u) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are posing a lot of conditions here.
An example that meets these conditions is
a <- moma_sfpca(X,
u_sparsity = lasso(), v_sparsity = lasso(),
alpha_v = seq(0.1, 1, 0.15), alpha_u = 0.32,
Omega_v = second_diff_mat(8), Omega_u = second_diff_mat(17),
lambda_v = seq(0.1, 1, 0.2), lambda_u = 2.1
)
a$interpolate(alpha_v = 0.23, lambda_v = 0.121)
U <- 0.5 * (result_lo$U + result_hi$U) | ||
V <- 0.5 * (result_lo$V + result_hi$V) | ||
|
||
# newSv <- diag(self$p) + alpha_v * self$Omega_v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found a closed-form solution for projection onto an ellipsoid so far.
Note columns of U and V are not of norm 1. But since direction is what we care so I'll leave it to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. Let me think more about this - I'm pretty busy for the next few days, so we can leave this as an issue and come back to it in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
}) | ||
|
||
test_that("SFPCA object: interpolate, exact mode", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review this part first to see some use cases.
tests/testthat/test_sfpca_wrapper.R
Outdated
) | ||
expect_error( | ||
SFPCA$new(X, rank = 3.1, center = FALSE, scale = FALSE), | ||
"rank should be a legit positive integer" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe drop "legit" - it's a bit casual.
I also tend to put argument names (here rank) in backticks to make them look like code (at least to someone who knows markdown syntax)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test now becomes:
expect_error(
SFPCA$new(X, rank = "3.1", center = FALSE, scale = FALSE),
"`rank` should be a positive integer smaller than the rank of the data matrix."
)
svda <- svd(X, nu = 3, nv = 3) | ||
|
||
expect_equal(abs(mysvd$U), abs(svda$u), check.attributes = FALSE) # use `abs` to avoid sign difference | ||
expect_equal(abs(mysvd$V), abs(svda$v), check.attributes = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we keep singular values as well? If so, let's test them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
tests/testthat/test_sfpca_wrapper.R
Outdated
|
||
|
||
# test default arguments | ||
expect_equal(a$get_mat_by_index(), a$get_mat_by_index()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this line testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it was a mistake. It is now:
a <- SFPCA$new(X, rank = 3, center = FALSE, scale = FALSE)
mysvd <- a$get_mat_by_index()
expect_equal(rownames(mysvd$U), rown)
expect_equal(rownames(mysvd$V), coln)
# `a` is specified without
# any parameters (all four parameters default
# to 0 actually), so users must not specify any
# parameters when calling `get_mat_by_index`.
# A bit stringent though.
expect_error(
a$get_mat_by_index(alpha_u = 1),
"Invalid index in SFPCA::get_mat_by_index."
)
expect_error(
a$get_mat_by_index(alpha_v = 1),
"Invalid index in SFPCA::get_mat_by_index."
)
expect_error(
a$get_mat_by_index(lambda_u = 1),
"Invalid index in SFPCA::get_mat_by_index."
)
expect_error(
a$get_mat_by_index(lambda_v = 1),
"Invalid index in SFPCA::get_mat_by_index."
)
expect_no_error(
a$get_mat_by_index(),
"Invalid index in SFPCA::get_mat_by_index."
)
tests/testthat/test_sfpca_wrapper.R
Outdated
) | ||
expect_error( | ||
a$left_project(matrix(0, 4, 1)), | ||
"`newX` is incompatible with orignal data." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we say what the correct dimensions should be in this error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
) | ||
expect_warning( | ||
a$get_mat_by_index(alphau = 1), | ||
paste0("extra argument ", sQuote("alphau"), " will be disregarded") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"alphau" -> "alpha_u"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this one is intended. When users have a typo or give other arguments we don't use, we give an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Do you want a warning or a full error here? (I can go either way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it should be just a warning. I made a mistake when I said "we give an error".
tests/testthat/test_sfpca_wrapper.R
Outdated
|
||
|
||
test_that("SFPCA object wrappers: moma_twfpca", { | ||
X <- matrix(runif(17 * 8), 17, 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to have set.seed
in all tests with randomness, just in case something weird happens on Travis that needs to be debugged locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a set.seed
at the beginning of this file already.
Added anyway.
tests/testthat/test_sfpca_wrapper.R
Outdated
) | ||
expect_error( | ||
a$interpolate(), | ||
"R6 ojbect SFPCA do not support interpolation when BIC selection scheme has been used." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"object"
Also - I'm a bit confused by this one: shouldn't the error message reflect the fact that we aren't saying where interpolation is happening? The BIC thing, while valid, seems like a bit of a fakeout. (I.e., if you tell me to interpolate from grid results, but don't give parameter values, I'd still expect an error.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a <- moma_sfpca(X,
u_sparsity = lasso(), v_sparsity = lasso(),
lambda_u = c(1, 2), lambda_v = c(1, 2),
alpha_u = c(1, 2),
alpha_v = c(1, 2), # selected by BIC
Omega_u = second_diff_mat(17), Omega_v = second_diff_mat(8),
selection_scheme_str = "gggb"
)
expect_error(
a$interpolate(),
"R6 object SFPCA do not support interpolation when BIC selection scheme has been used."
)
expect_error(
a$interpolate(
lambda_u = 1.1, lambda_v = 1.3,
alpha_u = 1.4
),
"R6 object SFPCA do not support interpolation when BIC selection scheme has been used."
)
Current design does not support interpolation if BIC scheme has been used for any parameters because I find it hard to make sense of it.
a$interpolate( | ||
alpha_u = 1, exact = TRUE | ||
), | ||
"Invalid index in SFPCA::interpolate: alpha_u." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to throw an error for exact = TRUE
mode? If we're already (theoretically) re-solving, what harm is there from introducing a new parameter value ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified this part to make it clearer.
# Once BIC is used, `SFPCA::interpolate`
# cannot be called at all.
a <- moma_sfpca(X,
u_sparsity = lasso(), v_sparsity = lasso(),
lambda_u = c(1, 2), lambda_v = c(1, 2),
alpha_u = c(1, 2),
alpha_v = c(1, 2), # selected by BIC
Omega_u = second_diff_mat(17), Omega_v = second_diff_mat(8),
selection_scheme_str = "gggb"
)
expect_error(
a$interpolate(),
"R6 object SFPCA do not support interpolation when BIC selection scheme has been used."
)
expect_error(
a$interpolate(
lambda_u = 1.1, lambda_v = 1.3,
alpha_u = 1.4
),
"R6 object SFPCA do not support interpolation when BIC selection scheme has been used."
)
Current design does not support interpolation if BIC scheme has been used for any parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. We might want to relax this at some point in the future, but makes sense now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor things, but I think this looks good to merge. Let me know when to hit the approve button and merge it.
@@ -55,8 +55,8 @@ sfpca <- function(X, | |||
lambda_u <- as.vector(lambda_u) | |||
lambda_v <- as.vector(lambda_v) | |||
|
|||
Omega_u <- if (is.null(Omega_u)) diag(dim(X)[1]) else Omega_u | |||
Omega_v <- if (is.null(Omega_v)) diag(dim(X)[2]) else Omega_v | |||
Omega_u <- Omega_u %||% diag(dim(X)[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
tests/testthat/test_sfpca_wrapper.R
Outdated
set.seed(12) | ||
X <- matrix(seq(1:12), 4, 3) | ||
X[1, 1] <- "abc" | ||
expect_error(SFPCA$new(X), "X must not have NaN, NA, or Inf.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem quite right as an error message / test, though I agree it's an error. Maybe add a different (separate) check that is.double(X)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
expect_equal(a$v_sparsity, empty()) | ||
|
||
expect_error( | ||
SFPCA$new(matrix(runif(12), 3, 4), selection_scheme_str = "bbba"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
expect_error( | ||
SFPCA$new(X, rank = 0, center = FALSE, scale = FALSE), | ||
"`rank` should be a positive integer smaller than the rank of the data matrix." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we actually checking the "real" rank here (e.g., Matrix::rankMatrix) or just min(n, p)
? Since we aren't using real singular vectors, it is possible to have a higher SFPCA-rank than real-rank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just min(n, p).
) | ||
expect_warning( | ||
a$get_mat_by_index(alphau = 1), | ||
paste0("extra argument ", sQuote("alphau"), " will be disregarded") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Do you want a warning or a full error here? (I can go either way)
|
||
# test selection schemes | ||
expect_error( | ||
moma_spca(X, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does moma_spca
only allow for sparse U? In general, Sparse PCA refers to sparse V because most Sparse PCA is defined in the covariance model not the SVD model.
In fact, I'd almost think about making moma_spca
only do sparse V. I can't really think of a standard method that is sparse U only...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then should we remove twspca
?
I'd provide a slightly broader range of choices and let the user make the decision.
expect_true(all(a$selection_scheme_list == c(0, 0, 0, 0))) | ||
|
||
a <- moma_fpca(X, | ||
alpha_u = seq(0, 2, 0.2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above - folks might only expect FPCA to do smooth V. Let's do the same thing here as we decide for moma_spca.
a$interpolate( | ||
alpha_u = 1, exact = TRUE | ||
), | ||
"Invalid index in SFPCA::interpolate: alpha_u." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. We might want to relax this at some point in the future, but makes sense now.
Also, let me know if you want to rebase / clean up the commit history before merging. |
Codecov Report
@@ Coverage Diff @@
## master #42 +/- ##
==========================================
- Coverage 90.22% 89.98% -0.25%
==========================================
Files 30 32 +2
Lines 2016 2487 +471
==========================================
+ Hits 1819 2238 +419
- Misses 197 249 +52
Continue to review full report at Codecov.
|
No description provided.