Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflation for CCA and LDA #46

Closed
wants to merge 31 commits into from

Conversation

Banana1530
Copy link
Collaborator

No description provided.

@codecov
Copy link

codecov bot commented Jul 26, 2019

Codecov Report

Merging #46 into master will decrease coverage by 13.16%.
The diff coverage is 55.5%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #46       +/-   ##
=========================================
- Coverage   90.17%    77%   -13.17%     
=========================================
  Files          32     34        +2     
  Lines        2372   3558     +1186     
=========================================
+ Hits         2139   2740      +601     
- Misses        233    818      +585
Impacted Files Coverage Δ
R/moma_solve.R 98.07% <ø> (ø)
src/moma.h 100% <ø> (ø) ⬆️
src/moma_solver.h 100% <ø> (ø) ⬆️
src/moma_base.h 100% <ø> (ø) ⬆️
src/moma_solver_BICsearch.cpp 100% <100%> (ø) ⬆️
R/util.R 100% <100%> (+4.54%) ⬆️
src/moma_solver.cpp 94.73% <100%> (ø) ⬆️
R/moma_sflda.R 44.74% <44.74%> (ø)
R/moma_sfpca.R 70.62% <47.59%> (-20.02%) ⬇️
R/moma_sfcca.R 56% <56%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00839b5...ff6da69. Read the comment docs.

src/moma.cpp Outdated
@@ -56,6 +57,9 @@ MoMA::MoMA(const arma::mat &i_X, // Pass X_ as a reference to avoid copy
i_X.n_cols)
// const reference must be passed to initializer list
{
ds = DeflationScheme::PCA;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my recent paper: maybe call this HotellingPCA since there are other PCA deflation approaches.

src/moma.cpp Outdated
arma::mat X_cv = X_working * u;
double norm_X_cv = arma::norm(X_cv);

// subtract cv's out of X_working and Y_working
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any changes to Y_working here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.overleaf.com/7692846932qhrncsnkpygj

Here I came up with a heuristic deflation scheme while you were on a trip. For LDA, I deflate X only while leaving Y, the indicator matrix, unchanged.

The doc also describes how LDA fits into MoMA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna set up a meeting with @agenevera to discuss and confirm we're all on the same page.

Thoughts on implementing Schur Complement and Projection Deflation (probably normalized) from Section 3 of https://arxiv.org/abs/1907.12012 as well?

@michaelweylandt
Copy link
Member

The CodeCov report is excellent! Thanks for adding.

Can we add CodeCov first as a small separate PR so we get reports for all the big PRs before we merge them?

@michaelweylandt
Copy link
Member

It looks like some of the commits got misordered: I'm not sure how 91b8210 can follow 76280eb since the former uses the two argument form of MoMA which is only added in the latter.

@michaelweylandt
Copy link
Member

This can now be rebased (not merged) against the current master.

@michaelweylandt
Copy link
Member

One thought re:testing (e.g., 31ef3c1): in addition to testing that you get the same results from an R and C++ implementation, you can also test that the theoretical properties hold: e.g., if doing Hotelling deflation with regularization, we would expect u^T X_defl v = 0, but not u^T X_defl = 0, while both should be zero for Schur and projection deflation.

It's not the strictest test in the world, but it's at least a different thing to test which is often a good complement to just checking that you coded the same thing in two languages.

@Banana1530
Copy link
Collaborator Author

Good idea. Added.

@michaelweylandt
Copy link
Member

Great - just an FYI: @agenevera and I are working on systemitizing the deflation / orthogonality related stuff, so let's focus on the other two open PRs for the next few days. I'm gonna try to write up our thoughts before I go out of town Saturday, but I'm not sure exactly when that will be done by.

@Banana1530
Copy link
Collaborator Author

Gotcha.

))
},

plot = function() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR contains all the work in #47, i.e., Shiny apps for PCA, LDA and CCA.

#' in nested greedy BIC selection scheme.
#' @param rank A positive integer. Defaults to 1. The maximal rank, i.e., maximal number of principal components to be used.
#' @export

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters described here will be reused in moma_sflda and other relevant functions.

},
private_error_if_extra_arg = function(..., is_missing) {
is_fixed <- self$fixed_list
if (any(is_fixed == TRUE & is_missing == FALSE)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, when a parameter is "fixed" but the user mistakenly gives a value, we give an error.

// Pass X_ as a reference to avoid copy
const arma::mat &X_working,
const arma::mat &Y_working,
/*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new initializer takes two matrix instead of one.


// [[Rcpp::export]]
Rcpp::List cca(const arma::mat &X, // We should not change any variable in R, so const ref
const arma::mat &Y,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used to expose the MoMA::grid_BIC_mix method, the same as the C++ function cpp_multirank_BIC_grid_search . This difference lies in that it use the two-matrix initializer.

@michaelweylandt
Copy link
Member

michaelweylandt commented Aug 19, 2019

  • This needs to be merged into develop under the new branch scheme instead of master
  • Can we increase coverage of the new stuff - this is a big hit compared to the previous version.

@michaelweylandt
Copy link
Member

Can you pull the Shiny stuff out of this PR and just leave it as CCA and LDA? That might help the coverage numbers.

Also, can you clean up the history? It should be possible to move the typo fixes into the original commits for a shorter history.

Copy link
Member

@michaelweylandt michaelweylandt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's too much going on in one PR here - can we break this into 3 or 4 smaller PRs?

#' The following table lists the supported methods for
#' R6 objects generated by \code{moma_*pca}, \code{moma_*cca}
#' and \code{moma_*lda} types functions.
#' \tabular{lccccccc}{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe roxygen2 supports Markdown syntax as well. That might be an easier way to make a table.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown syntax for tables don't work, even if I turn on markdown support for a package.

#' @section Members:
#'
#' \describe{
#' \item{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - this might be easier with Markdown syntax.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

moma_error(
"Sparse penalty should be of class ",
sQuote("moma_sparsity"),
sQuote("_moma_sparsity_type"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using a leading _ prefix here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because lasso(), scad() types of functions are not exposed to users, but moma_lasso(), moma_scad() are. The latter have extra arguments lambda and select_scheme.

#' = ( I - { c_x } \left( { c_x } ^ { T } { c_x } \right) ^ { - 1 } { c_x } ^ { T } )X,}
#'
#' \eqn{ Y \leftarrow { Y } - { c_y } \left( { c_y } ^ { T } { c_y } \right) ^ { - 1 } { c_y } ^ { T } { Y }
#' = (I - { c_y } \left( { c_y } ^ { T } { c_y } \right) ^ { - 1 } { c_y } ^ { T } ) Y}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just "normal" (PCA-style) projection deflation on the X^TY matrix? That might an easier way to implement. I'm not sure we need to break these up at the C++ level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to bookkeep the deflated X and Y.

moma_scca <- function(X, ..., Y,
center = TRUE, scale = FALSE,
x_sparse = moma_empty(), y_sparse = moma_empty(),
# x_smooth = moma_smoothness(), y_smooth = moma_smoothness(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time I read this, it's still confusing to me: why is no smoothing the default if we recommend SFPCA as a general case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moma_scca is supposed to impose only sparsity, isn't it?

What happens in moma_smoothness is if user specifies alpha explicitly but not Omega, then we use second different matrix as smoothing.

Also please check out this test case,

https://github.com/Banana1530/MoMA-1/blob/64a34bbad62aa3dc2c553a70dc8e9be980477041/tests/testthat/test_sfpca_wrapper.R#L647

and the latest commit "Give info: use sec-diff-mat as default":

check_omega <- function(Omega, alpha, n) {

    ## LOGIC:
    # if alpha = 0: overwrite Omega_u to identity matrix whatever it was
    # if alpha is a grid or a non-zero scalar: 
    #       if Omega missing: set to second-difference matrix
    #       else check validity
...
}

In conclusion, if the user specifies alpha, he gets sec-diff-mat or whatever he wants as the smoothing matrix; if he does NOT specify alpha, then he gets no smoothing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second different matrix -> second difference matrix

return(res)
},
private_left_project = function(newX, ...,
alpha_u = 1, alpha_v = 1, lambda_u = 1, lambda_v = 1, rank = 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really formatted correctly? It looks super weird.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rstyler did process the file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any useful info on customizing this type of alignment.

Strangely here (#46 (comment)) has the correct indentation.

)
private$check_input_index <- TRUE
return(res)
},
private_error_if_not_indeces = function(...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Banana1530
Copy link
Collaborator Author

It is moved to PR #54.

@Banana1530 Banana1530 closed this Aug 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants