Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Mutate can't convert haven labelled to characater #28

Open
mbcann01 opened this issue Aug 9, 2022 · 0 comments
Open

Error: Mutate can't convert haven labelled to characater #28

mbcann01 opened this issue Aug 9, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@mbcann01
Copy link
Member

mbcann01 commented Aug 9, 2022

Overview

While working on the L2C codebook, I got this error. I went into the codebookr code to try to fix it. Nothing I was doing was working or really even making sense. Then I closed Rstudio, reopened it, and tried again. It magically started working. However, I want to keep a record of what I already tried in case it comes back.

Test file

I created a test qmd file for debugging. Here is what it contained.

library(dplyr, warn.conflicts = FALSE)
library(codebookr)

Load L2C data

path <- "/Users/bradcannell/Library/CloudStorage/OneDrive-TheUniversityofTexasHealthScienceCenteratHouston/01_research/L2C Teams/Participant Data/R data/combined_participant_data.rds"
combined_participant_data <- readr::read_rds(path)

Keep only the variables from the L2C data that seemed to be causing trouble.

df <- combined_participant_data %>% 
  # select(835:840)
  select(835)
print(codebook(df), "test.docx")

The print code above is where I was getting the error. But, here is the really interesting part. When I run devtools::load_all(), the code chunk above works, but when I run devtools::install() I get the same error as when I just load the package with library(codebookr).

I found a post on R Community that seemed relevant (https://community.rstudio.com/t/devtools-load-all-vs-install/51787/3). It said,

One possible reason is that with devtools::load_all() you make all the functions in your package available to the current R session, even those which are not exported via NAMESPACE. With library(package_name) you only get those functions that are exported via NAMESPACE.

It also said,

Keep in mind that library(package_name) is The Truth which devtools::load_all() "roughly simulates".

The person who posted the comment said they could recreate their error by running devtools::load_all(export_all = FALSE) instead of just devtools::load_all(). So, I tried to do the same. My thought was that the Roxygen code I added to the files I didn't want to appear on the pkgdown function list (@keywords internal) might be the issue. However, this didn't fix the problem. The code worked with devtools::load_all(export_all = FALSE) and not with devtools::install(). I went so far as to add @export to every R file, run devtools::document() again, and then run devtools::install() again. I was still getting the same error.

What cb_summary_stats_few_cats is being used?

Next, I tried to understand if there were any differences in the code for cb_summary_stats_few_cats() when I was using devtools::load_all(export_all = FALSE) vs. devtools::install(). So I:

  1. Change cb_summary_stats_few_cats to return summary before anything except n is calculated.
  2. Restart R.
  3. devtools::load_all(export_all = FALSE)
  4. Check running cb_summary_stats_few_cats only
  5. Check running codebook
# Here is the change I made to cb_summary_stats_few_cats only
summary <- df %>%
    dplyr::count(.data[[.x]]) %>%
    # Rename the first column from the name of the variable being analyzed to
    # "cat"
    dplyr::rename(cat = 1)

    return(summary)

    # I commented out all the code below.

    # Change the category label for missing values from NA to "Missing"
    # If .x is a factor, then replace_na() won't work. Have to change to
    # character first.
    dplyr::mutate(
      cat = as.character(cat),
      cat = tidyr::replace_na(cat, "Missing")
    ) %>%
    # Calculate the cumulative total and percentage
    dplyr::mutate(
      cum_freq = cumsum(n),
      prop     = n / max(cum_freq),
      percent  = prop * 100
    ) %>%
    # Keep columns of interest
    dplyr::select(cat, n, cum_freq, percent) %>%
    # Format numeric results
    dplyr::mutate(
      dplyr::across(
        .cols = c(n, cum_freq, percent),
        .fns  = ~ format(.x, nsmall = digits, big.mark = ",")
      )
    )

Then ran just cb_summary_stats_few_cats.

cb_summary_stats_few_cats(df, "any_health_insurance", digits = 2)

This code chunk returned the expected dplyr summary table with cat and n only. Cat is a <S3: haven_labelled>. So, the change I made to the code was obviously recognized when I loaded the package files with devtools::load_all(export_all = FALSE). Then, I ran the chunk below.

print(codebook(df), "test.docx")

However, this code returned a codebook document with a summary table that also included cumulative frequency and percent (the code I commented out above). Therefore, the codebook() function is clearly using some other version of cb_summary_stats_few_cats(), but why and which one?

Before moving on, I checked really quickly to see if I could reproduce the error outside of my codebookr code.

df %>%
    dplyr::count(any_health_insurance) %>%
    # Rename the first column from the name of the variable being analyzed to
    # "cat"
    dplyr::rename(cat = 1) %>%
    # Change the category label for missing values from NA to "Missing"
    # If .x is a factor, then replace_na() won't work. Have to change to
    # character first.
    dplyr::mutate(
      cat = as.character(cat),
      cat = tidyr::replace_na(cat, "Missing")
    ) %>%
    # Calculate the cumulative total and percentage
    dplyr::mutate(
      cum_freq = cumsum(n),
      prop     = n / max(cum_freq),
      percent  = prop * 100
    ) %>%
    # Keep columns of interest
    dplyr::select(cat, n, cum_freq, percent) %>%
    # Format numeric results
    dplyr::mutate(
      dplyr::across(
        .cols = c(n, cum_freq, percent),
        .fns  = ~ format(.x, nsmall = 2, big.mark = ",")
      )
    )

I could not. The code above works fine. So, what version of the code is devtools::install() using and why isn't it working? To try to answer this, I decided to load the package with library() and then view the internals of the function.

  1. Restart R
  2. library(codebookr)
  3. View the internals of cb_summary_stats_few_cats
library(dplyr, warn.conflicts = FALSE)
library(codebookr)
cb_summary_stats_few_cats

However, the code returned in the code chunk above looks correct (i.e., like the code that runs when using devtools::load_all()). So, I'm still not sure what the issue is.

For Stack overflow

At that point, I decided to create a post on Stack Overflow. I wanted to create a reproducable example, so my plan was to ask responders to clone my codebookr repo and use the haven labeled study data.

Question: devtools::load_all() works as expected, but devtools::install() does not

library(dplyr, warn.conflicts = FALSE)
library(codebookr)
study <- haven::read_dta("/Users/bradcannell/Dropbox/R/Packages/codebookr/inst/extdata/study.dta")
print(codebook(df), "test.docx")

However, this code ran just fine. Then I closed out RStudio entirely, went back to my L2C project, and then the codebook started running there too. I have no idea what changed. However, I'm saving this issue in case it comes up again.

@mbcann01 mbcann01 added the bug Something isn't working label Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Ideas/Eventually/Maybe
Development

No branches or pull requests

1 participant