-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deflation for CCA and LDA #46
Conversation
Codecov Report
@@ Coverage Diff @@
## master #46 +/- ##
=========================================
- Coverage 90.17% 77% -13.17%
=========================================
Files 32 34 +2
Lines 2372 3558 +1186
=========================================
+ Hits 2139 2740 +601
- Misses 233 818 +585
Continue to review full report at Codecov.
|
src/moma.cpp
Outdated
@@ -56,6 +57,9 @@ MoMA::MoMA(const arma::mat &i_X, // Pass X_ as a reference to avoid copy | |||
i_X.n_cols) | |||
// const reference must be passed to initializer list | |||
{ | |||
ds = DeflationScheme::PCA; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my recent paper: maybe call this HotellingPCA since there are other PCA deflation approaches.
src/moma.cpp
Outdated
arma::mat X_cv = X_working * u; | ||
double norm_X_cv = arma::norm(X_cv); | ||
|
||
// subtract cv's out of X_working and Y_working |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any changes to Y_working here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.overleaf.com/7692846932qhrncsnkpygj
Here I came up with a heuristic deflation scheme while you were on a trip. For LDA, I deflate X only while leaving Y, the indicator matrix, unchanged.
The doc also describes how LDA fits into MoMA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna set up a meeting with @agenevera to discuss and confirm we're all on the same page.
Thoughts on implementing Schur Complement and Projection Deflation (probably normalized) from Section 3 of https://arxiv.org/abs/1907.12012 as well?
The CodeCov report is excellent! Thanks for adding. Can we add CodeCov first as a small separate PR so we get reports for all the big PRs before we merge them? |
This can now be rebased (not merged) against the current |
14f39c4
to
b968a73
Compare
One thought re:testing (e.g., 31ef3c1): in addition to testing that you get the same results from an R and C++ implementation, you can also test that the theoretical properties hold: e.g., if doing Hotelling deflation with regularization, we would expect u^T X_defl v = 0, but not u^T X_defl = 0, while both should be zero for Schur and projection deflation. It's not the strictest test in the world, but it's at least a different thing to test which is often a good complement to just checking that you coded the same thing in two languages. |
31ef3c1
to
37d3de4
Compare
Good idea. Added. |
Great - just an FYI: @agenevera and I are working on systemitizing the deflation / orthogonality related stuff, so let's focus on the other two open PRs for the next few days. I'm gonna try to write up our thoughts before I go out of town Saturday, but I'm not sure exactly when that will be done by. |
a061e5b
to
3aecc1a
Compare
Gotcha. |
3aecc1a
to
e87aa67
Compare
)) | ||
}, | ||
|
||
plot = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR contains all the work in #47, i.e., Shiny apps for PCA, LDA and CCA.
#' in nested greedy BIC selection scheme. | ||
#' @param rank A positive integer. Defaults to 1. The maximal rank, i.e., maximal number of principal components to be used. | ||
#' @export | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters described here will be reused in moma_sflda
and other relevant functions.
}, | ||
private_error_if_extra_arg = function(..., is_missing) { | ||
is_fixed <- self$fixed_list | ||
if (any(is_fixed == TRUE & is_missing == FALSE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, when a parameter is "fixed" but the user mistakenly gives a value, we give an error.
// Pass X_ as a reference to avoid copy | ||
const arma::mat &X_working, | ||
const arma::mat &Y_working, | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new initializer takes two matrix instead of one.
|
||
// [[Rcpp::export]] | ||
Rcpp::List cca(const arma::mat &X, // We should not change any variable in R, so const ref | ||
const arma::mat &Y, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used to expose the MoMA::grid_BIC_mix
method, the same as the C++ function cpp_multirank_BIC_grid_search
. This difference lies in that it use the two-matrix initializer.
|
Can you pull the Shiny stuff out of this PR and just leave it as CCA and LDA? That might help the coverage numbers. Also, can you clean up the history? It should be possible to move the typo fixes into the original commits for a shorter history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's too much going on in one PR here - can we break this into 3 or 4 smaller PRs?
#' The following table lists the supported methods for | ||
#' R6 objects generated by \code{moma_*pca}, \code{moma_*cca} | ||
#' and \code{moma_*lda} types functions. | ||
#' \tabular{lccccccc}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe roxygen2
supports Markdown syntax as well. That might be an easier way to make a table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Markdown syntax for tables don't work, even if I turn on markdown support for a package.
#' @section Members: | ||
#' | ||
#' \describe{ | ||
#' \item{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - this might be easier with Markdown syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
moma_error( | ||
"Sparse penalty should be of class ", | ||
sQuote("moma_sparsity"), | ||
sQuote("_moma_sparsity_type"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using a leading _
prefix here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because lasso()
, scad()
types of functions are not exposed to users, but moma_lasso()
, moma_scad()
are. The latter have extra arguments lambda
and select_scheme
.
#' = ( I - { c_x } \left( { c_x } ^ { T } { c_x } \right) ^ { - 1 } { c_x } ^ { T } )X,} | ||
#' | ||
#' \eqn{ Y \leftarrow { Y } - { c_y } \left( { c_y } ^ { T } { c_y } \right) ^ { - 1 } { c_y } ^ { T } { Y } | ||
#' = (I - { c_y } \left( { c_y } ^ { T } { c_y } \right) ^ { - 1 } { c_y } ^ { T } ) Y}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just "normal" (PCA-style) projection deflation on the X^TY matrix? That might an easier way to implement. I'm not sure we need to break these up at the C++ level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to bookkeep the deflated X and Y.
moma_scca <- function(X, ..., Y, | ||
center = TRUE, scale = FALSE, | ||
x_sparse = moma_empty(), y_sparse = moma_empty(), | ||
# x_smooth = moma_smoothness(), y_smooth = moma_smoothness(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time I read this, it's still confusing to me: why is no smoothing the default if we recommend SFPCA as a general case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moma_scca
is supposed to impose only sparsity, isn't it?
What happens in moma_smoothness
is if user specifies alpha explicitly but not Omega, then we use second different matrix as smoothing.
Also please check out this test case,
and the latest commit "Give info: use sec-diff-mat as default":
check_omega <- function(Omega, alpha, n) {
## LOGIC:
# if alpha = 0: overwrite Omega_u to identity matrix whatever it was
# if alpha is a grid or a non-zero scalar:
# if Omega missing: set to second-difference matrix
# else check validity
...
}
In conclusion, if the user specifies alpha, he gets sec-diff-mat or whatever he wants as the smoothing matrix; if he does NOT specify alpha, then he gets no smoothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
second different matrix -> second difference matrix
return(res) | ||
}, | ||
private_left_project = function(newX, ..., | ||
alpha_u = 1, alpha_v = 1, lambda_u = 1, lambda_v = 1, rank = 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really formatted correctly? It looks super weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rstyler
did process the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any useful info on customizing this type of alignment.
Strangely here (#46 (comment)) has the correct indentation.
) | ||
private$check_input_index <- TRUE | ||
return(res) | ||
}, | ||
private_error_if_not_indeces = function(..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
It is moved to PR #54. |
No description provided.