Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corx function fails in the presence of *value* labels (not just variable labels) #11

Open
teecrow opened this issue Jul 19, 2023 · 2 comments

Comments

@teecrow
Copy link

teecrow commented Jul 19, 2023

Hi @conig,

Thanks for a great package - I find it to be the best for APA-ready tables, and used it in my dissertation. However, I've found the need to remove all label attributes from my dataset before it can work. (For anyone reading, it's possible to quickly remove all labels by passing your data frame/tibble into the labelled::remove_labels() function, or, if that doesn't work, sjlabelled::remove_all_labels().) In a past Issue here, variable labels (i.e., the description of the variable itself) were tackled, but not value labels.

The corx::corx() function still fails in the presence of value labels, i.e., label attributes that can be linked to specific numeric values in a numeric vector. These are common in SPSS-imported data.

In the example below, corx works fine in the presence of a variable label for Sepal Length $Sepal.Length$label but fails in the presence of value labels for Sepal Length $Sepal.Length$labels:

Here's a reprex

library(tidyverse)
library(corx)
data(iris)

labelled::var_label(iris$Sepal.Length) <- 'Length of the sepal'
# Corx works fine with a *variable* label:
iris |> 
  select(where(is.numeric)) |> 
  corx()
iris |> map(attributes)

iris_new <- iris |> 
  labelled::set_value_labels(
    Sepal.Length = c(mediumlengh = 5.1, lowerlength = 4.6)
    )
# But fails in the presence of *value* labels, 
# which are common in SPSS-imported data:
iris_new |> 
  select(where(is.numeric)) |> 
  corx()
iris_new |> map(attributes)

Console output from Reprex

> library(tidyverse)
── Attaching core tidyverse packages ─────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2readr     2.1.4forcats   1.0.0stringr   1.5.0ggplot2   3.4.2tibble    3.2.1lubridate 1.9.2tidyr     1.3.0purrr     1.0.1     
── Conflicts ───────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
> library(corx)
> data(iris)
> labelled::var_label(iris$Sepal.Length) <- 'Length of the sepal'
> # Corx works fine with a *variable* label:
> iris |> 
+   select(where(is.numeric)) |> 
+   corx()
corx(data = select(iris, where(is.numeric)))

--------------------------------------------------------------
             Sepal.Length Sepal.Width Petal.Length Petal.Width
--------------------------------------------------------------
Sepal.Length           -         -.12       .87***      .82***
Sepal.Width          -.12          -       -.43***     -.37***
Petal.Length       .87***     -.43***           -       .96***
Petal.Width        .82***     -.37***       .96***          - 
--------------------------------------------------------------
Note. * p < 0.05; ** p < 0.01; *** p < 0.001
> iris |> map(attributes)
$Sepal.Length
$Sepal.Length$label
[1] "Length of the sepal"


$Sepal.Width
NULL

$Petal.Length
NULL

$Petal.Width
NULL

$Species
$Species$levels
[1] "setosa"     "versicolor" "virginica" 

$Species$class
[1] "factor"


> iris_new <- iris |> 
+   labelled::set_value_labels(
+     Sepal.Length = c(mediumlengh = 5.1, lowerlength = 4.6)
+     )
> # But fails in the presence of *value* labels, 
> # which are common in SPSS-imported data:
> iris_new |> 
+   select(where(is.numeric)) |> 
+   corx()
Error: All classes must be numeric. [1] 'Sepal.Length' <hv_,vc_,dbl>.
> iris_new |> map(attributes)
$Sepal.Length
$Sepal.Length$labels
mediumlengh lowerlength 
        5.1         4.6 

$Sepal.Length$label
[1] "Length of the sepal"

$Sepal.Length$class
[1] "haven_labelled" "vctrs_vctr"     "double"        


$Sepal.Width
NULL

$Petal.Length
NULL

$Petal.Width
NULL

$Species
$Species$levels
[1] "setosa"     "versicolor" "virginica" 

$Species$class
[1] "factor"

sessionInfo()

> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] corx_1.0.7.2    lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0  
 [5] dplyr_1.1.2     purrr_1.0.1     readr_2.1.4     tidyr_1.3.0    
 [9] tibble_3.2.1    ggplot2_3.4.2   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.3      crayon_1.5.2      compiler_4.3.1    tidyselect_1.2.0 
 [5] psych_2.3.6       parallel_4.3.1    scales_1.2.1      labelled_2.12.0  
 [9] lattice_0.21-8    R6_2.5.1          generics_0.1.3    munsell_0.5.0    
[13] pillar_1.9.0      tzdb_0.4.0        rlang_1.1.1       utf8_1.2.3       
[17] stringi_1.7.12    timechange_0.2.0  cli_3.6.1         withr_2.5.0      
[21] magrittr_2.0.3    grid_4.3.1        rstudioapi_0.15.0 haven_2.5.3      
[25] hms_1.1.3         nlme_3.1-162      lifecycle_1.0.3   vctrs_0.6.3      
[29] mnormt_2.1.1      glue_1.6.2        fansi_1.0.4       colorspace_2.1-0 
[33] tools_4.3.1       pkgconfig_2.0.3  

I'm happy to provide any more info or context. Once this and Issue 10 (handling factor -> numeric) are resolved, I really think you should promote the heck out of this package! From the perspective of an academic in social science, it's really great in the options it provides. Thanks and take care.

@conig
Copy link
Owner

conig commented Aug 3, 2023

Thanks for pointing this out, for providing a reproducible example, and for this kind feedback. I've pushed an update to github that should resolve this issue on branch 1.0.7.3. I'll give it some time before pushing to CRAN to give myself more time to identify any mistakes.

I'll consider issue #10 more, I'm a little worried that mistakes will be made as it's not always obvious how factors should be converted to numeric and it's easy to miss warnings. Perhaps making users select this with an argument would be safer.

remotes::install_github("conig/corx@1.0.7.3")
#> Skipping install of 'corx' from a github remote, the SHA1 (46147a4d) has not changed since last install.
#>   Use `force = TRUE` to force installation

library(corx)

iris_new <- iris |> 
  labelled::set_value_labels(
    Sepal.Length = c(mediumlengh = 5.1, lowerlength = 4.6)
  )
# But fails in the presence of *value* labels, 
# which are common in SPSS-imported data:
iris_new |> 
  dplyr::select(where(is.numeric)) |> 
  corx()
#> corx(data = dplyr::select(iris_new, where(is.numeric)))
#> 
#> --------------------------------------------------------------
#>              Sepal.Length Sepal.Width Petal.Length Petal.Width
#> --------------------------------------------------------------
#> Sepal.Length           -         -.12       .87***      .82***
#> Sepal.Width          -.12          -       -.43***     -.37***
#> Petal.Length       .87***     -.43***           -       .96***
#> Petal.Width        .82***     -.37***       .96***          - 
#> --------------------------------------------------------------
#> Note. * p < 0.05; ** p < 0.01; *** p < 0.001
Created on 2023-08-03 with [reprex v2.0.2](https://reprex.tidyverse.org/)

@teecrow
Copy link
Author

teecrow commented Aug 3, 2023

Thanks for tackling this! And yes I think Issue #10 is a tougher one. Perhaps instead/in addition to a warning, the resulting corx object could add a 'warnings' vector and/or modify the default print method. In this way, right above the correlation matrix, the user could be shown a message making clear that a conversion was done, and why that might give unanticipated or misleading results. For better or worse, other correlation packages (to my knowledge?) don't error in cases where factors are present, but then again perhaps they should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants