Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R6 PCA wrappers #42

Merged
merged 34 commits into from
Aug 5, 2019
Merged

Conversation

Banana1530
Copy link
Collaborator

No description provided.

R/moma_expose.R Outdated

# check if Omega is a matrix
if (!is.matrix(Omega) && !is.null(Omega)) {
moma_error("Omage_u/v is not a matrix.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omega not Omage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

R/moma_expose.R Outdated
moma_error("Omage_u/v is not a matrix.")
}

# store them as sparse matrices using the package Matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using sparse matrices here?

Copy link
Collaborator Author

@Banana1530 Banana1530 Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is a TODO. See the new commit "Add TODO".

@michaelweylandt
Copy link
Member

Before we can do a full review, all the R stuff needs at least basic documentation and to be added to the NAMESPACE using roxygen2's @export directive. I'm not 100% sure what all needs to be exported on the R6 side but I'd guess Hadley's "R Packages" book will discuss what needs exporting (methods or just our pseudo-S3 helper functions).

@Banana1530
Copy link
Collaborator Author

Yes, documentations are key to the package. However, I thought it would be our focus in the third coding phase so I did not write docs for these wrappers.

Anyway, for the sake of this review, I added some basic documentations (they are far from the final version though). Also there are some use cases in tests suites. These should be enough for reviewing this PR.

Copy link
Member

@michaelweylandt michaelweylandt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool design! Let's iterate it on a bit, but I like where it's going quite a lot!

@@ -72,29 +77,42 @@ Rcpp::List MoMA::criterion_search(const arma::vec &bic_au_grid,
// We conduct 2 BIC searches over 2D grids here instead
// of 4 searches over 1D grids. It's consistent with
// Genevera's code.
while (tol > EPS_bic && iter < max_bic_iter)
if (n_au > 1 || n_av > 1 || n_lu > 1 || n_lv > 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to change this? I don't see how this additional loop connects to the PCA wrappers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 'bic_au_grid' and so forth are all scalars instead of vectors, then there is no need for the searching process. So we jump directly to evaluating the final result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose - flip side being, if we are searching over a grid of length 1, the overhead is minimal, so is it worth the additional case in the code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - the overhead is negligible. But let's keep it.

R/moma_sfpca.R Outdated
u_sparsity = empty(), v_sparsity = empty(), lambda_u = 0, lambda_v = 0, # lambda_u/_v is a vector or scalar
Omega_u = NULL, Omega_v = NULL, alpha_u = 0, alpha_v = 0, # so is alpha_u/_v
pg_setting = moma_pg_settings(),
selection_scheme_str = "gggg",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky for users. In general, we shouldn't be asking folks to remember the order of the 4 internal parameters.

What do you think about having a flag for selection inside the penalty objects (default TRUE)?

u_sparsity = lasso(select = FALSE) # GRID
u_sparsity = lasso(select = TRUE) # or maybe "BIC" does BIC selection

That seems to make intent clearer, but I don't know how nasty it would make the code.

Copy link
Collaborator Author

@Banana1530 Banana1530 Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Omega_u/_v, the smoothing matrices? How to add the select argument to it?

Besides, we will supplement with detailed documentations. So it shouldn't be very troublesome for users. I am thinking about adding a special doc for selection_scheme_str and explaining the selection procedure here, so that users can find out about it by typing ? selection_scheme_str.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The work of changing to this design mainly lies in updating tests cases, I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a similar sort of thing:

u_smoothness = second_order_difference(select = TRUE)

I used a similar pattern for fusion weights in the clustRviz package: it's a bit tricky to get your head around it, but check it out - might have some useful ideas.

Related, but separate, question: if we go this route, should parameter values be in the object or as a second argument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem of using u_smoothness = second_order_difference(select = TRUE) is that we have to provide a set of smoothness argument functions, just like what we did for lasso(), scad() etc. However this comes at the cost of limiting users' choice of smoothing terms.

The reason why I did not make those argument functions in the first place is that the choice for Omega can be very flexible, e.g., smoothing matrices in generalized additive models.

As for the second question, I think we should absorb parameter values into helper functions like lasso(), if we decide to do this. This would make our function look really nice though:

a <- moma_sfpca(X,
             v_sparsity = lasso(lambda=seq(0,4), select = "bic"),
             v_smoothness = sec_diff_mat(alpha=seq(0,2), select = "grid"))

I opened an issue to track this. #43

R/moma_sfpca.R Outdated
selection_criterion_lambda_v = 0
)
for (i in 1:4) {
selection_scheme_list[[i]] <- ifelse(strsplit(selection_scheme_str, split = "")[[1]][i] == "g", 0, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The substr function might simplify this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. See commit "strsplit -> substr".

R/moma_sfpca.R Outdated
self$selection_scheme_list <- selection_scheme_list

# Step 1.7: check rank
if (!inherits(rank, "numeric") || rank <= 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for "integerish" here - see the is.wholenumber example function in ?is.integer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. See commit "Add is.wholenumber(rank) and tests".

R/moma_sfpca.R Outdated
)
},

get_mat = function(alpha_u = 1, alpha_v = 1, lambda_u = 1, lambda_v = 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the dots and require arguments by name her trick? Positional arguments seem too finicky and easy to get in the wrong order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. See commit "get_mat -> get_mat_by_id, and dots".

for (i in (1:self$rank)) {
U[, i] <-
get_5Dlist_elem(self$grid_result,
alpha_u_i = alpha_u,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh - I really don't like specifying these as indices rather than values. How much work do you think #40 would be to get in place before merging this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What interpolation scheme do you have in mind? I am thinking about the 4-D version of bilinear interpolation. This can be done using the get_mat, which retrieves results using position indices.

R/moma_sfpca.R Outdated
list(coln, paste0("PC", seq_len(rank)))
dimnames(U) <-
list(rown, paste0("PC", seq_len(rank)))
return(list(U = U, V = V))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function called get_mat? Is it for users to use? (I'm not super familiar with R6 visibility rules.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short for "get the matrices U and V".

Yes, I am thinking exposing it. The user gets an SFPCA object by calling moma_spca, moma_fpca etc., and call its member functions like get_mat, left_project and so on to play with it.

I think if we expose the R6 object all of its public methods get exposed too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation to expose it (by index) as opposed to by value?

R/moma_sfpca.R Outdated
# newX should be uncencter and unscaled.
# check new X has same colnames
if (length(dim(newX)) != 2L) {
moma_error("'newX' must be a matrix or data frame")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to support data frames in these functions. Things get super weird with non-numeric values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of supporting data frames is to make use of column names and row names. See here:

https://github.com/Banana1530/MoMA-1/blob/f37529dd4492813c9ab80978b4bc9b8fc158cd79/R/moma_sfpca.R#L235

For example, the loadings (a matrix packed by left singular vectors, of dimension p x k), should have row names the same as column names of the data matrix. See here:

https://github.com/Banana1530/MoMA-1/blob/f37529dd4492813c9ab80978b4bc9b8fc158cd79/R/moma_sfpca.R#L237

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But matrices can have both column and rownames too. (Against data frames, the tidyverse crowd is pushing back on having rownames for tibbles (their data frame subclass) so that's a bigger issue.)

For me, the bigger quesetion, however is what happens with non-numeric elements of data frames? Are we going to do a factor encoding automatically or just error out? Requiring a matrix avoids this question.

Copy link
Collaborator Author

@Banana1530 Banana1530 Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as.matrix is.finite deals with non-numeric elements.
https://github.com/Banana1530/MoMA-1/blob/ef9bb20acd71b5c43eed13d70d1a0fecf70037e7/R/moma_sfpca.R#L108

I also added a test in commit "Add test "SFPCA object: X contains string""

test_that("SFPCA object: X contains string", {
    set.seed(12)
    SFPCA$debug("initialize")
    X <- matrix(seq(1:12), 4, 3)
    X[1, 1] <- "abc"
    expect_error(SFPCA$new(X), "X must not have NaN, NA, or Inf.")
})

R/moma_sfpca.R Outdated
moma_error("`newX` is incompatible with orignal data.")
}

PV <- solve((t(V) %*% V), t(V))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t(V) %%*% V is just crossprod(V)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. I always forgot to use these little helpers.

Fixed. See commit "t(V) %%*% V -> crossprod(V)".

}
}
else
{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else statement here keeps the format of u/v_result consistent in both cases. They should be vectors containing Rcpp::Named("lambda"), Rcpp::Named("alpha"), Rcpp::Named("vector"), Rcpp::Named("bic").

#' @export
SFPCA <- R6::R6Class("SFPCA",
private = list(
check_input_index = TRUE,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check_input_index works as a toggle. When interpolate calls private_get_mat_by_index, we turn off input checking to make things easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this: I don't need that we need a fast path, but if we do, maybe it's easier to have a (public) function which does input checking and then passes along to a (private) function that doesn't input check but does the actual work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for a "fast path" is that what parameters can be specified in the function get_mat_by_id is dependent on how SFPCA is initialized. For example,

a1 <- moma_sfpca(X)
a1$get_mat_by_id() # OK
a1$get_mat_by_id(alpha_u = 1) # not allowed, it is redundant

a1 <- moma_sfpca(X, lambda_u=seq(0,10), u_sparsity=lasso())
a1$get_mat_by_id() # error! lambda_u not specified
a1$get_mat_by_id(lambda_u = 2) # OK
a1$get_mat_by_id(lambda_u = 2, alpha_u = 1) # error too, alpha_u is redundant

So calling get_mat_by_id internally will be a pain. This is why we have the toggle check_input_index . When get_mat_by_id is called by internal function, it is set to FALSE. See:

https://github.com/Banana1530/MoMA-1/blob/ef9bb20acd71b5c43eed13d70d1a0fecf70037e7/R/moma_sfpca.R#L48

R/moma_sfpca.R Outdated

if (private$check_input_index) {
missing_list <- list(missing(alpha_u), missing(alpha_v), missing(lambda_u), missing(lambda_v))
if (any(self$fixed_list == TRUE & missing_list != TRUE)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fixed" parameters are those
i) that are chosen by BIC, or ,
ii) that are not specified during initialization of the SFCPA object, or
iii) that are scalars during initialization of the SFCPA object.

For example,

a <- moma_sfpca(X, lambda_u = 1.2, lambda_v = 1:10, v_sparsity = u_sparsity = lasso())

Then a$get_mat_by_id(lambda_v = 3) is valid, while a$get_mat_by_id(lambda_v = 3 , lambda_u = 1) is redundant and should give a warning (actually an error in code). a$get_mat_by_id(lambda_v = 3 , lambda_u = 2) is wrong because the lambda_u grid is of length 1.

Another example:

# lambda_v is selected by BIC
a <- moma_sfpca(X, lambda_v = 1:10, v_sparsity =lasso(), selecthion_scheme_str = "ggbg")

Then a$get_mat_by_id(lambda_v = 1) is redundant and will give an error in current design.

Another example:

a <- moma_sfpca(X, lambda_v = 1, lambda_u = 2, alpha_v =3, alpha_u = 4, lasso()....)

Then a$get_mat_by_id(alpha_u=1,alpha_v=1,lambda_u=1,lambda_v=1) will give an error, though it makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable.

}

# a bool: want to interpolate on u side?
inter_u <- !missing(alpha_u) && !missing(lambda_u) &&
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are posing a lot of conditions here.

An example that meets these conditions is

a <- moma_sfpca(X,
        u_sparsity = lasso(), v_sparsity = lasso(),
        alpha_v = seq(0.1, 1, 0.15), alpha_u = 0.32,
        Omega_v = second_diff_mat(8), Omega_u = second_diff_mat(17),
        lambda_v = seq(0.1, 1, 0.2), lambda_u = 2.1
    )

a$interpolate(alpha_v = 0.23, lambda_v = 0.121)

U <- 0.5 * (result_lo$U + result_hi$U)
V <- 0.5 * (result_lo$V + result_hi$V)

# newSv <- diag(self$p) + alpha_v * self$Omega_v
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found a closed-form solution for projection onto an ellipsoid so far.

Note columns of U and V are not of norm 1. But since direction is what we care so I'll leave it to users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Let me think more about this - I'm pretty busy for the next few days, so we can leave this as an issue and come back to it in a different PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#44

)
})

test_that("SFPCA object: interpolate, exact mode", {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review this part first to see some use cases.

)
expect_error(
SFPCA$new(X, rank = 3.1, center = FALSE, scale = FALSE),
"rank should be a legit positive integer"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop "legit" - it's a bit casual.

I also tend to put argument names (here rank) in backticks to make them look like code (at least to someone who knows markdown syntax)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test now becomes:

    expect_error(
        SFPCA$new(X, rank = "3.1", center = FALSE, scale = FALSE),
        "`rank` should be a positive integer smaller than the rank of the data matrix."
    )

svda <- svd(X, nu = 3, nv = 3)

expect_equal(abs(mysvd$U), abs(svda$u), check.attributes = FALSE) # use `abs` to avoid sign difference
expect_equal(abs(mysvd$V), abs(svda$v), check.attributes = FALSE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we keep singular values as well? If so, let's test them here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.



# test default arguments
expect_equal(a$get_mat_by_index(), a$get_mat_by_index())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it was a mistake. It is now:

    a <- SFPCA$new(X, rank = 3, center = FALSE, scale = FALSE)
    mysvd <- a$get_mat_by_index()

    expect_equal(rownames(mysvd$U), rown)
    expect_equal(rownames(mysvd$V), coln)


    # `a` is specified without
    # any parameters (all four parameters default
    # to 0 actually), so users must not specify any
    # parameters when calling `get_mat_by_index`.
    # A bit stringent though.
    expect_error(
        a$get_mat_by_index(alpha_u = 1),
        "Invalid index in SFPCA::get_mat_by_index."
    )
    expect_error(
        a$get_mat_by_index(alpha_v = 1),
        "Invalid index in SFPCA::get_mat_by_index."
    )
        expect_error(
        a$get_mat_by_index(lambda_u = 1),
        "Invalid index in SFPCA::get_mat_by_index."
    )
    expect_error(
        a$get_mat_by_index(lambda_v = 1),
        "Invalid index in SFPCA::get_mat_by_index."
    )
    expect_no_error(
        a$get_mat_by_index(),
        "Invalid index in SFPCA::get_mat_by_index."
    )

)
expect_error(
a$left_project(matrix(0, 4, 1)),
"`newX` is incompatible with orignal data."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we say what the correct dimensions should be in this error message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

)
expect_warning(
a$get_mat_by_index(alphau = 1),
paste0("extra argument ", sQuote("alphau"), " will be disregarded")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"alphau" -> "alpha_u"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this one is intended. When users have a typo or give other arguments we don't use, we give an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Do you want a warning or a full error here? (I can go either way)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it should be just a warning. I made a mistake when I said "we give an error".



test_that("SFPCA object wrappers: moma_twfpca", {
X <- matrix(runif(17 * 8), 17, 8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to have set.seed in all tests with randomness, just in case something weird happens on Travis that needs to be debugged locally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a set.seed at the beginning of this file already.

Added anyway.

)
expect_error(
a$interpolate(),
"R6 ojbect SFPCA do not support interpolation when BIC selection scheme has been used."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"object"

Also - I'm a bit confused by this one: shouldn't the error message reflect the fact that we aren't saying where interpolation is happening? The BIC thing, while valid, seems like a bit of a fakeout. (I.e., if you tell me to interpolate from grid results, but don't give parameter values, I'd still expect an error.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    a <- moma_sfpca(X,
        u_sparsity = lasso(), v_sparsity = lasso(),
        lambda_u = c(1, 2), lambda_v = c(1, 2),
        alpha_u = c(1, 2),
        alpha_v = c(1, 2), # selected by BIC
        Omega_u = second_diff_mat(17), Omega_v = second_diff_mat(8),
        selection_scheme_str = "gggb"
    )
    expect_error(
        a$interpolate(),
        "R6 object SFPCA do not support interpolation when BIC selection scheme has been used."
    )
    expect_error(
        a$interpolate(
            lambda_u = 1.1, lambda_v = 1.3,
            alpha_u = 1.4
        ),
        "R6 object SFPCA do not support interpolation when BIC selection scheme has been used."
    )

Current design does not support interpolation if BIC scheme has been used for any parameters because I find it hard to make sense of it.

a$interpolate(
alpha_u = 1, exact = TRUE
),
"Invalid index in SFPCA::interpolate: alpha_u."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to throw an error for exact = TRUE mode? If we're already (theoretically) re-solving, what harm is there from introducing a new parameter value ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this part to make it clearer.

    # Once BIC is used, `SFPCA::interpolate`
    # cannot be called at all.
    a <- moma_sfpca(X,
        u_sparsity = lasso(), v_sparsity = lasso(),
        lambda_u = c(1, 2), lambda_v = c(1, 2),
        alpha_u = c(1, 2),
        alpha_v = c(1, 2), # selected by BIC
        Omega_u = second_diff_mat(17), Omega_v = second_diff_mat(8),
        selection_scheme_str = "gggb"
    )
    expect_error(
        a$interpolate(),
        "R6 object SFPCA do not support interpolation when BIC selection scheme has been used."
    )
    expect_error(
        a$interpolate(
            lambda_u = 1.1, lambda_v = 1.3,
            alpha_u = 1.4
        ),
        "R6 object SFPCA do not support interpolation when BIC selection scheme has been used."
    )

Current design does not support interpolation if BIC scheme has been used for any parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. We might want to relax this at some point in the future, but makes sense now.

Copy link
Member

@michaelweylandt michaelweylandt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things, but I think this looks good to merge. Let me know when to hit the approve button and merge it.

@@ -55,8 +55,8 @@ sfpca <- function(X,
lambda_u <- as.vector(lambda_u)
lambda_v <- as.vector(lambda_v)

Omega_u <- if (is.null(Omega_u)) diag(dim(X)[1]) else Omega_u
Omega_v <- if (is.null(Omega_v)) diag(dim(X)[2]) else Omega_v
Omega_u <- Omega_u %||% diag(dim(X)[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

set.seed(12)
X <- matrix(seq(1:12), 4, 3)
X[1, 1] <- "abc"
expect_error(SFPCA$new(X), "X must not have NaN, NA, or Inf.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem quite right as an error message / test, though I agree it's an error. Maybe add a different (separate) check that is.double(X)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

expect_equal(a$v_sparsity, empty())

expect_error(
SFPCA$new(matrix(runif(12), 3, 4), selection_scheme_str = "bbba"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find this a bit confusing, but let's stick with it for now and address in #43 / #48

)
expect_error(
SFPCA$new(X, rank = 0, center = FALSE, scale = FALSE),
"`rank` should be a positive integer smaller than the rank of the data matrix."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we actually checking the "real" rank here (e.g., Matrix::rankMatrix) or just min(n, p)? Since we aren't using real singular vectors, it is possible to have a higher SFPCA-rank than real-rank.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just min(n, p).

)
expect_warning(
a$get_mat_by_index(alphau = 1),
paste0("extra argument ", sQuote("alphau"), " will be disregarded")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Do you want a warning or a full error here? (I can go either way)


# test selection schemes
expect_error(
moma_spca(X,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does moma_spca only allow for sparse U? In general, Sparse PCA refers to sparse V because most Sparse PCA is defined in the covariance model not the SVD model.

In fact, I'd almost think about making moma_spca only do sparse V. I can't really think of a standard method that is sparse U only...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then should we remove twspca?

I'd provide a slightly broader range of choices and let the user make the decision.

expect_true(all(a$selection_scheme_list == c(0, 0, 0, 0)))

a <- moma_fpca(X,
alpha_u = seq(0, 2, 0.2),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above - folks might only expect FPCA to do smooth V. Let's do the same thing here as we decide for moma_spca.

a$interpolate(
alpha_u = 1, exact = TRUE
),
"Invalid index in SFPCA::interpolate: alpha_u."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. We might want to relax this at some point in the future, but makes sense now.

@michaelweylandt
Copy link
Member

Also, let me know if you want to rebase / clean up the commit history before merging.

@codecov
Copy link

codecov bot commented Aug 5, 2019

Codecov Report

Merging #42 into master will decrease coverage by 0.24%.
The diff coverage is 89.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #42      +/-   ##
==========================================
- Coverage   90.22%   89.98%   -0.25%     
==========================================
  Files          30       32       +2     
  Lines        2016     2487     +471     
==========================================
+ Hits         1819     2238     +419     
- Misses        197      249      +52
Impacted Files Coverage Δ
R/moma_arguments.R 92.13% <ø> (ø) ⬆️
src/moma_level1.cpp 94.9% <100%> (+0.42%) ⬆️
R/util.R 100% <100%> (ø)
R/moma_expose.R 93% <100%> (+0.21%) ⬆️
R/sfpca.R 98.18% <100%> (ø) ⬆️
R/moma_sfpca.R 88.74% <88.74%> (ø)
src/moma_solver.cpp 94.89% <0%> (-0.73%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27adfa8...be02e5c. Read the comment docs.

@michaelweylandt michaelweylandt merged commit 14c8ca9 into DataSlingers:master Aug 5, 2019
@Banana1530 Banana1530 deleted the r6pcawrapper branch August 20, 2019 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants