Deflation for CCA and LDA #46

Banana1530 · 2019-07-21T16:00:11Z

No description provided.

codecov · 2019-07-26T04:03:24Z

Codecov Report

Merging #46 into master will decrease coverage by 13.16%.
The diff coverage is 55.5%.

@@            Coverage Diff            @@
##           master    #46       +/-   ##
=========================================
- Coverage   90.17%    77%   -13.17%     
=========================================
  Files          32     34        +2     
  Lines        2372   3558     +1186     
=========================================
+ Hits         2139   2740      +601     
- Misses        233    818      +585

Impacted Files	Coverage Δ
R/moma_solve.R	`98.07% <ø> (ø)`
src/moma.h	`100% <ø> (ø)`	⬆️
src/moma_solver.h	`100% <ø> (ø)`	⬆️
src/moma_base.h	`100% <ø> (ø)`	⬆️
src/moma_solver_BICsearch.cpp	`100% <100%> (ø)`	⬆️
R/util.R	`100% <100%> (+4.54%)`	⬆️
src/moma_solver.cpp	`94.73% <100%> (ø)`	⬆️
R/moma_sflda.R	`44.74% <44.74%> (ø)`
R/moma_sfpca.R	`70.62% <47.59%> (-20.02%)`	⬇️
R/moma_sfcca.R	`56% <56%> (ø)`
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00839b5...ff6da69. Read the comment docs.

michaelweylandt · 2019-08-03T16:58:04Z

src/moma.cpp

@@ -56,6 +57,9 @@ MoMA::MoMA(const arma::mat &i_X,  // Pass X_ as a reference to avoid copy
               i_X.n_cols)
 // const reference must be passed to initializer list
 {
+    ds         = DeflationScheme::PCA;


Per my recent paper: maybe call this HotellingPCA since there are other PCA deflation approaches.

michaelweylandt · 2019-08-03T16:59:30Z

src/moma.cpp

+        arma::mat X_cv   = X_working * u;
+        double norm_X_cv = arma::norm(X_cv);
+
+        // subtract cv's out of X_working and Y_working


I don't see any changes to Y_working here.

https://www.overleaf.com/7692846932qhrncsnkpygj

Here I came up with a heuristic deflation scheme while you were on a trip. For LDA, I deflate X only while leaving Y, the indicator matrix, unchanged.

The doc also describes how LDA fits into MoMA.

I'm gonna set up a meeting with @agenevera to discuss and confirm we're all on the same page.

Thoughts on implementing Schur Complement and Projection Deflation (probably normalized) from Section 3 of https://arxiv.org/abs/1907.12012 as well?

michaelweylandt · 2019-08-03T17:01:22Z

The CodeCov report is excellent! Thanks for adding.

Can we add CodeCov first as a small separate PR so we get reports for all the big PRs before we merge them?

michaelweylandt · 2019-08-03T17:02:51Z

It looks like some of the commits got misordered: I'm not sure how 91b8210 can follow 76280eb since the former uses the two argument form of MoMA which is only added in the latter.

michaelweylandt · 2019-08-05T18:22:57Z

This can now be rebased (not merged) against the current master.

michaelweylandt · 2019-08-06T22:03:54Z

One thought re:testing (e.g., 31ef3c1): in addition to testing that you get the same results from an R and C++ implementation, you can also test that the theoretical properties hold: e.g., if doing Hotelling deflation with regularization, we would expect u^T X_defl v = 0, but not u^T X_defl = 0, while both should be zero for Schur and projection deflation.

It's not the strictest test in the world, but it's at least a different thing to test which is often a good complement to just checking that you coded the same thing in two languages.

Banana1530 · 2019-08-08T15:01:27Z

Good idea. Added.

michaelweylandt · 2019-08-08T15:03:05Z

Great - just an FYI: @agenevera and I are working on systemitizing the deflation / orthogonality related stuff, so let's focus on the other two open PRs for the next few days. I'm gonna try to write up our thoughts before I go out of town Saturday, but I'm not sure exactly when that will be done by.

Banana1530 · 2019-08-08T15:24:17Z

Gotcha.

Banana1530 · 2019-08-14T09:14:37Z

R/moma_sfcca.R

+            ))
+        },
+
+        plot = function() {


This PR contains all the work in #47, i.e., Shiny apps for PCA, LDA and CCA.

Banana1530 · 2019-08-14T09:17:08Z

R/moma_sfcca.R

+#' in nested greedy BIC selection scheme.
+#' @param rank A positive integer. Defaults to 1. The maximal rank, i.e., maximal number of principal components to be used.
+#' @export
+


Parameters described here will be reused in moma_sflda and other relevant functions.

Banana1530 · 2019-08-14T09:24:55Z

R/moma_sfcca.R

+        },
+        private_error_if_extra_arg = function(..., is_missing) {
+            is_fixed <- self$fixed_list
+            if (any(is_fixed == TRUE & is_missing == FALSE)) {


That is, when a parameter is "fixed" but the user mistakenly gives a value, we give an error.

Banana1530 · 2019-08-14T09:42:30Z

src/moma.h

+        // Pass X_ as a reference to avoid copy
+        const arma::mat &X_working,
+        const arma::mat &Y_working,
+        /*


This new initializer takes two matrix instead of one.

Banana1530 · 2019-08-14T09:44:56Z

src/moma_expose.cpp

+
+// [[Rcpp::export]]
+Rcpp::List cca(const arma::mat &X,  // We should not change any variable in R, so const ref
+               const arma::mat &Y,


This function is used to expose the MoMA::grid_BIC_mix method, the same as the C++ function cpp_multirank_BIC_grid_search . This difference lies in that it use the two-matrix initializer.

michaelweylandt · 2019-08-19T20:19:40Z

This needs to be merged into develop under the new branch scheme instead of master
Can we increase coverage of the new stuff - this is a big hit compared to the previous version.

michaelweylandt · 2019-08-19T20:21:31Z

Can you pull the Shiny stuff out of this PR and just leave it as CCA and LDA? That might help the coverage numbers.

Also, can you clean up the history? It should be possible to move the typo fixes into the original commits for a shorter history.

michaelweylandt

There's too much going on in one PR here - can we break this into 3 or 4 smaller PRs?

michaelweylandt · 2019-08-19T20:22:43Z

R/moma-R6.R

+#' The following table lists the supported methods for
+#' R6 objects generated by \code{moma_*pca}, \code{moma_*cca}
+#' and \code{moma_*lda} types functions.
+#' \tabular{lccccccc}{


I believe roxygen2 supports Markdown syntax as well. That might be an easier way to make a table.

Markdown syntax for tables don't work, even if I turn on markdown support for a package.

michaelweylandt · 2019-08-19T20:23:01Z

R/moma-R6.R

+#' @section Members:
+#'
+#' \describe{
+#'   \item{


See above - this might be easier with Markdown syntax.

michaelweylandt · 2019-08-19T20:23:38Z

R/moma_expose.R

        moma_error(
            "Sparse penalty should be of class ",
-            sQuote("moma_sparsity"),
+            sQuote("_moma_sparsity_type"),


Why are we using a leading _ prefix here?

Because lasso(), scad() types of functions are not exposed to users, but moma_lasso(), moma_scad() are. The latter have extra arguments lambda and select_scheme.

michaelweylandt · 2019-08-19T20:26:19Z

R/moma_sfcca.R

+#' = ( I - { c_x } \left(  { c_x } ^ { T }  { c_x } \right) ^ { - 1 }  { c_x } ^ { T } )X,}
+#'
+#' \eqn{ Y \leftarrow { Y } -  { c_y } \left(  { c_y } ^ { T }  { c_y } \right) ^ { - 1 }  { c_y } ^ { T }  { Y }
+#' = (I -  { c_y } \left(  { c_y } ^ { T }  { c_y } \right) ^ { - 1 }  { c_y } ^ { T } ) Y}.


Isn't this just "normal" (PCA-style) projection deflation on the X^TY matrix? That might an easier way to implement. I'm not sure we need to break these up at the C++ level.

It might be useful to bookkeep the deflated X and Y.

michaelweylandt · 2019-08-19T20:27:43Z

R/moma_sfcca.R

+moma_scca <- function(X, ..., Y,
+                      center = TRUE, scale = FALSE,
+                      x_sparse = moma_empty(), y_sparse = moma_empty(),
+                      #    x_smooth = moma_smoothness(), y_smooth = moma_smoothness(),


Every time I read this, it's still confusing to me: why is no smoothing the default if we recommend SFPCA as a general case?

moma_scca is supposed to impose only sparsity, isn't it?

What happens in moma_smoothness is if user specifies alpha explicitly but not Omega, then we use second different matrix as smoothing.

Also please check out this test case,

https://github.com/Banana1530/MoMA-1/blob/64a34bbad62aa3dc2c553a70dc8e9be980477041/tests/testthat/test_sfpca_wrapper.R#L647

and the latest commit "Give info: use sec-diff-mat as default":

check_omega <- function(Omega, alpha, n) { ## LOGIC: # if alpha = 0: overwrite Omega_u to identity matrix whatever it was # if alpha is a grid or a non-zero scalar: # if Omega missing: set to second-difference matrix # else check validity ... }

In conclusion, if the user specifies alpha, he gets sec-diff-mat or whatever he wants as the smoothing matrix; if he does NOT specify alpha, then he gets no smoothing.

second different matrix -> second difference matrix

michaelweylandt · 2019-08-19T20:28:28Z

R/moma_sfpca.R

+            return(res)
+        },
+        private_left_project = function(newX, ...,
+                                                alpha_u = 1, alpha_v = 1, lambda_u = 1, lambda_v = 1, rank = 1) {


Is this really formatted correctly? It looks super weird.

rstyler did process the file.

I didn't find any useful info on customizing this type of alignment.

Strangely here (#46 (comment)) has the correct indentation.

michaelweylandt · 2019-08-19T20:28:42Z

R/moma_sfpca.R

            )
            private$check_input_index <- TRUE
            return(res)
+        },
+        private_error_if_not_indeces = function(...,


Banana1530 · 2019-08-21T06:05:16Z

It is moved to PR #54.

Banana1530 force-pushed the ccadeflation branch from 691fb94 to 14b3509 Compare July 22, 2019 03:02

michaelweylandt reviewed Aug 3, 2019

View reviewed changes

Banana1530 force-pushed the ccadeflation branch from 14f39c4 to b968a73 Compare August 6, 2019 08:14

Banana1530 force-pushed the ccadeflation branch from 31ef3c1 to 37d3de4 Compare August 7, 2019 07:18

Banana1530 force-pushed the ccadeflation branch from a061e5b to 3aecc1a Compare August 8, 2019 15:23

Banana1530 added 12 commits August 10, 2019 15:02

Change of function name: reset -> set_penalty

263336e

Extend C++ class MoMA to include CCA and LDA

19be549

C++ interface for CCA and LDA

947034b

Remove redundancy in moma_sfpca.R

acf0b1c

Add SFCCA$initialize() and tests

d1567a3

Add SFLDA$initialize() and tests

d492d73

Update error message

6663580

Add PCA deflation schemes on C++ side

09d5429

Expose deflation_scheme to R

ec80aaf

Add tests for PCA deflations

01e4e71

Add orthogonality tests

e87aa67

Add shiny app

ece8f2a

Banana1530 force-pushed the ccadeflation branch from 3aecc1a to e87aa67 Compare August 10, 2019 09:55

Banana1530 added 4 commits August 13, 2019 12:20

Change test files' names

48405bc

Update test cases

9b46b29

pg_setting -> pg_settings

ffacbb8

moma_sparsity -> _moma_sparsity_type

94042d8

Banana1530 added 8 commits August 13, 2019 14:41

selection_scheme_str -> select_scheme_list

de119ce

selection_criterion* -> select_scheme*

3d65666

docs

3e97195

Refactor SFPCA and update tests

80f468b

Refactor SFLDA and SFCCA

257e69d

Add CCA/LDA description, typo

541ce57

Fix the build

1a89545

Fix a typo:

2aebe98

Banana1530 commented Aug 14, 2019

View reviewed changes

Banana1530 added 7 commits August 14, 2019 17:53

Use error_if_not in SFCCA / SFLDA

157c4e5

Add vignettes

eb70115

Change template

321c93c

Update comments

8d62904

Move

ca3a92b

Update docs - more on methods in R6 objects

52485a1

\| \| -> ||

ff6da69

michaelweylandt reviewed Aug 19, 2019

View reviewed changes

Banana1530 mentioned this pull request Aug 21, 2019

NEW Deflation for CCA and LDA #54

Merged

Banana1530 closed this Aug 23, 2019

Deflation for CCA and LDA #46

Deflation for CCA and LDA #46

Conversation

Banana1530 commented Jul 21, 2019

codecov bot commented Jul 26, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelweylandt commented Aug 3, 2019

michaelweylandt commented Aug 3, 2019

michaelweylandt commented Aug 5, 2019

michaelweylandt commented Aug 6, 2019

Banana1530 commented Aug 8, 2019

michaelweylandt commented Aug 8, 2019

Banana1530 commented Aug 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelweylandt commented Aug 19, 2019 • edited by Banana1530 Loading

michaelweylandt commented Aug 19, 2019

michaelweylandt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Banana1530 commented Aug 21, 2019

codecov bot commented Jul 26, 2019 •

edited

Loading

michaelweylandt commented Aug 19, 2019 •

edited by Banana1530

Loading