Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend subset parameter to all biplot layers #38

Open
corybrunson opened this issue Aug 27, 2021 · 14 comments
Open

extend subset parameter to all biplot layers #38

corybrunson opened this issue Aug 27, 2021 · 14 comments
Labels
enhancement New feature or request

Comments

@corybrunson
Copy link
Owner

The experimental subset parameter of GeomIsolines$setup_data() should be extended to all plot layers.

To consistently distinguish between ggplot() and ggbiplot(), it would be good, if possible, to enable this parameter only for *_rows_*() and *_cols_*() layers. (Should it be a parameter of stat_rows(), stat_cols(), and the other new stat layers rather than of Geom*$setup_data()?)

An important decision is what inputs subset should be able to handle. Among the possibilities:

  • a positive (negative) integer vector indicating the rows to include (exclude) – understood by [.data.frame() and dplyr::slice() but not by subset(); almost definitely worth including since it will come most naturally to users
  • a logical vector of length the number of rows – understood by [.data.frame() and subset() but not by dplyr::slice(); should not cause confusion, but could just be required to be which()ed instead of handled separately
  • a character vector of row names – understood by [.data.frame() but not by subset() or dplyr::slice(); could cause confusion because print.tbl_ord() uses tibble-like printing, which does not display row names as print.data.frame() does, but might be important for large data workflows
@corybrunson corybrunson changed the title add parameter to extend subset parameter to all biplot layers Aug 27, 2021
@corybrunson corybrunson added this to To do in first CRAN submission via automation Aug 27, 2021
@corybrunson corybrunson added the enhancement New feature or request label Aug 27, 2021
@corybrunson
Copy link
Owner Author

@jtr13, since you've used the package (and i'm so glad you've found it useful!), i'd be glad for your opinion. From the bulleted list above, what data would you expect to have to give ggbiplot() to have only a subset of variable axes, or just one variable axis, plotted?

@jtr13
Copy link

jtr13 commented Aug 28, 2021

Happy to give an opinion... I assume by row names you mean the column names of the original data frame. I (always) prefer character vectors of names but not critical if there are other considerations.

@corybrunson
Copy link
Owner Author

Correct, when the data are provided as a data frame or matrix. Thank you!

@corybrunson
Copy link
Owner Author

A draft solution is in the subset branch. The parameter is understood by the matrix stats (StatRows and StatCols) as well as by their corresponding extensions of StatScale, and can therefore be used with any geom layer that pairs with these stat layers (as the projection geom will).

It is not understood by the other stat layers because, as implemented, the subsetting would affect the results of their calculations. The logistic PCA examples illustrate its use with numerical input, though logical and character inputs are also accepted.

More examples, and unit tests, are needed, but this looks like a workable solution!

@jtr13
Copy link

jtr13 commented Aug 28, 2021

Awesome! I'll give it a try and see if I can come up with some examples.

@corybrunson
Copy link
Owner Author

Contributions are never obligatory but always welcome. Especially bug reports.

@jtr13
Copy link

jtr13 commented Aug 29, 2021

Good to know!

@jtr13
Copy link

jtr13 commented Aug 29, 2021

Works for me with integers but not characters...I'm not sure what the right syntax is. In the finches example geom_rows_vector(alpha = .5, color = "darkred", subset = 3) works.

With geom_rows_vector(alpha = .5, color = "darkred", subset = "Isabella") I get

Warning in f(...) :
  Rows have no defined `.name`, so `subset` will be ignored.

And with geom_rows_vector(alpha = .5, color = "darkred", subset = Isabella) I get

Error in layer(data = data, mapping = mapping, stat = rows_stat(stat),  : 
  object 'Isabella' not found

As an aside, imho there are an overwhelming number of geoms in the package. I'd prefer for example for geom_rows_axis() to automatically draw tick marks and tick mark labels, with parameters to turn them off if desired. Of course just a suggestion to take or leave!

@corybrunson
Copy link
Owner Author

@jtr13 try using augment_ord() before fortifying / plotting the data. In order for row or column names to work, the row or column data frame passed to ggplot() has to have a .name field, which will be retrieved by augment_ord() if it is available from the model object. It's clunky, but i could not come up with a better way that didn't require a package-wide overhaul.

Though i should make the warning message clearer.

@corybrunson
Copy link
Owner Author

Never mind! I can reproduce the error after augment_ord(). I'll see if i can hack something.

@corybrunson
Copy link
Owner Author

Commit a505a86 in the subset branch automatically maps the (custom) .name_subset aesthetic to the .name field, if it exists, in the fortified 'tbl_ord' object. The previous code that looked for .name now looks for .name_subset, which will be there if augment_ord() has been run first (and if names are found in the model object).

Please let me know again whether it works!

Note: This is not an ideal solution. A better one would be to have a "pointer" (not in the low-level sense) to the original model object that has been cloaked in the 'tbl_ord' class. The ability to access this object from within 'tbl_ord' methods and within the ggplot2 build process would potentially solve many other problems. I'm leaving this issue open until a new one is created for this goal.

@jtr13
Copy link

jtr13 commented Aug 30, 2021

Sorry my fault for not providing a full reprex showing that I was using augment_ord(). It's working great now!

# site-species data frame of Sanderson Galapagos finches data
    library(ordr)
#> Loading required package: ggplot2
    library(magrittr)
    data(finches, package = "cooccur")
    finches %>% t() %>%
        logisticPCA_ord() %>%
        as_tbl_ord() -> finches_lpca
    finches_lpca %>%
        augment_ord() %>%
        ggbiplot(aes(label = .name), sec.axes = "cols", scale.factor = 50) +
        geom_rows_text_radiate(subset = "Isabella") +
        geom_rows_axis(subset = "Isabella", color = "royalblue3", lwd = .75) +
        geom_rows_axis_text(size = 3, subset = "Isabella", color = "royalblue3",
                            label_dodge = 2) +
        geom_rows_axis_ticks(subset = "Isabella", color = "royalblue3") +
        geom_rows_vector(color = "darkred") +
        geom_cols_point(alpha = .5, color = "royalblue3") +
        ggtitle(
            "Logistic PCA of the Galapagos island finches",
            "Islands (finches) scaled to the primary (secondary) axes"
        ) +
        expand_limits(x = c(-30, 25))

Created on 2021-08-29 by the reprex package (v2.0.1)

Now to be super picky, it doesn't seem that there's a label_dodge parameter for geom_rows_text_radiate().

@corybrunson
Copy link
Owner Author

Unfortunately this solution (for character vectors) interferes with the internal calculation of group, so for urgency i dropped it in 0e41f53 (@jtr13 please take note, with an apology). A "pointer" would remedy it.

@jtr13
Copy link

jtr13 commented Sep 22, 2021

No apologies... thanks for the heads-up!

@corybrunson corybrunson removed this from To do in first CRAN submission Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants