Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add names to matrix and vector outputs #5

Open
corybrunson opened this issue Feb 4, 2022 · 5 comments
Open

add names to matrix and vector outputs #5

corybrunson opened this issue Feb 4, 2022 · 5 comments

Comments

@corybrunson
Copy link

The matrices u and v and the vectors d and cors in the output of PMA::CCA(), for example, are unnamed. Maybe this is intentional for compatibility with certain routines. But it would be helpful for other purposes to have row and column names from the input data matrices x and z incorporated into the output, and for the dimension to be canonically named. Preserving names from the input data in the output would, for example, make it easier to read the output and to create other named objects from it, as below.

I would suggest the following for the CCA() output, for example:

  • assign the column names of x (respectively, z) to the row names of u (v)
  • assign something like sCD1 through sCD<K> (for "sparse canonical dimension") to the column names of both u and v
  • assign the same sCD* names to d and cors

If this is of interest, then i would be glad to submit a PR with suggested assignments for the outputs of SPC(), CCA(), and MultiCCA(). Thank you!

library(PMA)

# CCA of life cycle savings data
savings_cca <- CCA(
  LifeCycleSavings[, c(2L, 3L)],
  LifeCycleSavings[, c(1L, 4L, 5L)],
  K = 2L, penaltyx = .7, penaltyz = .7
)
#> 12
#> 12

# without names
print(savings_cca$u)
#>      [,1] [,2]
#> [1,]   -1    1
#> [2,]    0    0
print(savings_cca$v)
#>           [,1]        [,2]
#> [1,] 0.2422123 -0.98598522
#> [2,] 0.9702233  0.14634075
#> [3,] 0.0000000 -0.08010956
# with names (suggested)
rownames(savings_cca$u) <- names(LifeCycleSavings)[c(2L, 3L)]
rownames(savings_cca$v) <- names(LifeCycleSavings)[c(1L, 4L, 5L)]
colnames(savings_cca$u) <- colnames(savings_cca$v) <-
  paste0("sCD", seq(savings_cca$K))
print(savings_cca$u)
#>       sCD1 sCD2
#> pop15   -1    1
#> pop75    0    0
print(savings_cca$v)
#>           sCD1        sCD2
#> sr   0.2422123 -0.98598522
#> dpi  0.9702233  0.14634075
#> ddpi 0.0000000 -0.08010956

# one benefit: data frame names
print(as.data.frame(savings_cca$u))
#>       sCD1 sCD2
#> pop15   -1    1
#> pop75    0    0
tibble::rownames_to_column(as.data.frame(savings_cca$v), var = "response")
#>   response      sCD1        sCD2
#> 1       sr 0.2422123 -0.98598522
#> 2      dpi 0.9702233  0.14634075
#> 3     ddpi 0.0000000 -0.08010956

# another benefit: reveal matrix multiplication error
t(savings_cca$u) %*% diag(savings_cca$d) %*% t(savings_cca$v) # wrong row names
#>             sr       dpi ddpi
#> sCD1 -10.01703 -40.12494    0
#> sCD2  10.01703  40.12494    0
savings_cca$u %*% diag(savings_cca$d) %*% t(savings_cca$v) # right row names
#>              sr      dpi      ddpi
#> pop15 -22.60722 -38.2563 -1.022931
#> pop75   0.00000   0.0000  0.000000

Created on 2022-02-04 by the reprex package (v2.0.1)

@corybrunson
Copy link
Author

corybrunson commented Feb 4, 2022

I realize now that the arguments xnames and znames partially resolve this issue. I think it would be appropriate for them to default to colnames(x) and colnames(z), respectively, and this would be part of the proposed PR. I apologize for overlooking that!

bnaras added a commit that referenced this issue Feb 5, 2022
See [Issue 5](#5)
@bnaras
Copy link
Owner

bnaras commented Feb 5, 2022

I pushed a commit with the defaults for xnames and znames.

@corybrunson
Copy link
Author

@bnaras very cool, thank you!

@corybrunson
Copy link
Author

corybrunson commented Feb 6, 2022

It looks like the names may not be preserved through the process. If x and z are matrices, then names() doesn't get their column names; and, when they are data frames, the scale() calls (inside CCA()) convert them to matrices before names() are obtained. These problems should be solved by replacing names() with colnames(), which works both on data frames and on matrices.

library(PMA)
sessioninfo::session_info(pkgs = "PMA")
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 3.6.0 (2019-04-26)
#>  os       macOS  10.15.7
#>  system   x86_64, darwin15.6.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2022-02-06
#>  pandoc   2.16.2 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package * version date (UTC) lib source
#>  PMA     * 1.2-2   2022-02-06 [1] Github (bnaras/PMA@8e3fd29)
#> 
#>  [1] /Users/jason.brunson/Library/R/3.6/library
#>  [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

# names of data frame inputs
names(LifeCycleSavings)
#> [1] "sr"    "pop15" "pop75" "dpi"   "ddpi"

# CCA of life cycle savings data
savings_cca <- CCA(
  as.matrix(LifeCycleSavings[, c(2L, 3L)]),
  as.matrix(LifeCycleSavings[, c(1L, 4L, 5L)]),
  K = 2L, penaltyx = .7, penaltyz = .7
)
#> 12
#> 12

# missing names
savings_cca$u
#>      [,1] [,2]
#> [1,]   -1    1
#> [2,]    0    0
savings_cca$xnames
#> NULL

Created on 2022-02-06 by the reprex package (v2.0.1)

@bnaras
Copy link
Owner

bnaras commented Feb 6, 2022

Gah, that was my bad. Pushed a commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants