Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more options to tidy.prcomp #557

Merged
merged 4 commits into from Apr 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 6 additions & 1 deletion DESCRIPTION
Expand Up @@ -212,7 +212,12 @@ Authors@R:
comment = c(ORCID = "0000-0002-9978-011X")),
person(given = "Jared",
family = "Wilber",
role = "ctb"))
role = "ctb"),
person(given = "Vilmantas",
family = "Gegzna",
email = "GegznaV@gmail.com",
role = "ctb",
comment = c(ORCID = "0000-0002-9500-5167")))
Description: Summarizes key information about statistical
objects in tidy tibbles. This makes it easy to report results, create
plots and consistently work with large numbers of models at once.
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Expand Up @@ -83,6 +83,7 @@ TODO: sort out what happens to `glance.aov()`
- Added `tidy.regsubsets()` for best subsets linear regression from the `leaps` package

- Added method `tidy.lm.beta()` to tidy `lm.beta` class models (#545 by @mattle24)
- `tidy.prcomp()` parameter `matrix` gained new options `"scores"`, `"loadings"`, and `"eigenvalues"` (#557 by @GegznaV)

- `tidy.kmeans()` now uses the names of the input variables in the output by
default. Set `col.names = NULL` to recover the old behavior.
Expand Down
34 changes: 19 additions & 15 deletions R/stats-prcomp-tidiers.R
Expand Up @@ -5,39 +5,42 @@
#' @param matrix Character specifying which component of the PCA should be
#' tidied.
#'
#' - `"u"`, `"samples"`, or `"x"`: returns information about the map from
#' the original space into principal components space.
#' - `"u"`, `"samples"`, `"scores"`, or `"x"`: returns information about
#' the map from the original space into principle components space.
#'
#' - `"v"`, `"rotation"`, or `"variables"`: returns information about the
#' map from principal components space back into the original space.
#' - `"v"`, `"rotation"`, `"loadings"` or `"variables"`: returns information
#' about the map from principle components space back into the original
#' space.
#'
#' - `"d"` or `"pcs"`: returns information about the eigenvalues.
#' - `"d"`, `"eigenvalues"` or `"pcs"`: returns information about the
#' eigenvalues.
#'
#' @template param_unused_dots
#'
#' @return A [tibble::tibble] with columns depending on the component of
#' PCA being tidied.
#'
#' If `matrix` is `"u"`, `"samples"`, or `"x"` each row in the tidied
#' output corresponds to the original data in PCA space. The columns are:
#' If `matrix` is `"u"`, `"samples"`, `"scores"`, or `"x"` each row in the
#' tidied output corresponds to the original data in PCA space. The columns
#' are:
#'
#' \item{`row`}{ID of the original observation (i.e. rowname from original
#' data).}
#' \item{`PC`}{Integer indicating a principal component.}
#' \item{`value`}{The score of the observation for that particular principal
#' component. That is, the location of the observation in PCA space.}
#'
#' If `matrix` is `"v"`, `"rotation"`, or `"variables"`, each row in the
#' tidied ouput corresponds to information about the principal components
#' in the original space. The columns are:
#' If `matrix` is `"v"`, `"rotation"`, `"loadings"` or `"variables"`, each
#' row in the tidied ouput corresponds to information about the principle
#' components in the original space. The columns are:
#'
#' \item{`row`}{The variable labels (colnames) of the data set on
#' which PCA was performed}
#' \item{`PC`}{An integer vector indicating the principal component}
#' \item{`value`}{The value of the eigenvector (axis score) on the
#' indicated principal component}
#'
#' If `matrix` is `"d"` or `"pcs"`, the columns are:
#' If `matrix` is `"d"`, `"eigenvalues"` or `"pcs"`, the columns are:
#'
#' \item{`PC`}{An integer vector indicating the principal component}
#' \item{`std.dev`}{Standard deviation explained by this PC}
Expand Down Expand Up @@ -91,17 +94,18 @@ tidy.prcomp <- function(x, matrix = "u", ...) {
stop("Must select a single matrix to tidy.", call. = FALSE)
}

MATRIX <- c("rotation", "x", "variables", "samples", "v", "u", "pcs", "d")
MATRIX <- c("rotation", "x", "variables", "samples", "v", "u", "pcs", "d",
"scores", "loadings", "eigenvalues")
matrix <- rlang::arg_match(matrix, MATRIX)

ncomp <- NCOL(x$rotation)
if (matrix %in% c("pcs", "d")) {
if (matrix %in% c("pcs", "d", "eigenvalues")) {
nn <- c("std.dev", "percent", "cumulative")
ret <- fix_data_frame(t(summary(x)$importance),
newnames = nn,
newcol = "PC"
)
} else if (matrix %in% c("rotation", "variables", "v")) {
} else if (matrix %in% c("rotation", "variables", "v", "loadings")) {
labels <- if (is.null(rownames(x$rotation))) {
1:nrow(x$rotation)
} else {
Expand All @@ -114,7 +118,7 @@ tidy.prcomp <- function(x, matrix = "u", ...) {
stringsAsFactors = FALSE
)
names(ret) <- c("column", "PC", "value")
} else if (matrix %in% c("x", "samples", "u")) {
} else if (matrix %in% c("x", "samples", "u", "scores")) {
labels <- if (is.null(rownames(x$x))) 1:nrow(x$x) else rownames(x$x)
samples <- tidyr::gather(as.data.frame(x$x))
ret <- data.frame(
Expand Down
9 changes: 6 additions & 3 deletions tests/testthat/test-stats-prcomp.R
Expand Up @@ -17,7 +17,8 @@ test_that("tidy.prcomp", {
check_tidy_output(td)
check_dims(td, 4, 4)
expect_identical(tidy(pc, matrix = "pcs"), td)

expect_identical(tidy(pc, matrix = "eigenvalues"), td)

td2 <- tidy(pc, matrix = "v")

check_tidy_output(td2, strict = FALSE)
Expand All @@ -29,15 +30,17 @@ test_that("tidy.prcomp", {
)
expect_identical(tidy(pc, matrix = "rotation"), td2)
expect_identical(tidy(pc, matrix = "variables"), td2)

expect_identical(tidy(pc, matrix = "loadings"), td2)

td3 <- tidy(pc, matrix = "u")

check_tidy_output(td3)
check_dims(td3, 200, 3)
expect_identical(tidy(pc, matrix = "x"), tidy(pc, matrix = "samples"))
expect_identical(tidy(pc, matrix = "x"), td3)
expect_identical(tidy(pc, matrix = "samples"), td3)

expect_identical(tidy(pc, matrix = "scores"), td3)

expect_error(
tidy(pc, matrix = c("d", "u")),
regexp = "Must select a single matrix to tidy."
Expand Down