Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: --DTC variable coercion to character #161

Closed
adchan11 opened this issue Jun 15, 2023 · 1 comment 路 Fixed by #214
Closed

Bug: --DTC variable coercion to character #161

adchan11 opened this issue Jun 15, 2023 · 1 comment 路 Fixed by #214
Assignees
Labels
bug Something isn't working programming

Comments

@adchan11
Copy link
Collaborator

What happened?

In reference to #145 馃憤

It would be great to have both. --DTC variables coerced to chr when they have an other type and no message when there is no coercion.

Session Information

No response

Reproducible Example

No response

@adchan11 adchan11 added bug Something isn't working programming labels Jun 15, 2023
@cpiraux
Copy link
Collaborator

cpiraux commented Dec 14, 2023

There is a coercion issue when a variable is a numeric date in data frame (df) and has a character type in metadata.

The first_class() function assigns a character type to date(time) variables. @elimillera @bms63, do you remember why date variables are considered as characters?

first_class <- function(x) {
  characterTypes <- getOption("xportr.character_types")
  class_ <- tolower(class(x)[1])
}
table_cols_types <- map(.df, first_class)

image

Original class:
image

Currently, characterTypes has the following values: "character", "char", "text", "date", "posixct", "posixt", "datetime", "time", "partialdate", "partialtime", "partialdatetime", "incompletedatetime", "durationdatetime", "intervaldatetime". It is used in xport_type() to convert both R data type (type.x) and metadata data type (type.y) to numeric or character.

meta_ordered <- left_join(
  data.frame(variable = names(.df), type = unlist(table_cols_types)),
  metadata,
  by = "variable"
) %>%
  mutate(
    # `_character` is used here as a mask for character, in case someone doesn't
    # want 'character' coerced to character
    type.x = if_else(type.x %in% characterTypes, "_character", type.x),
    type.x = if_else(type.x %in% numericTypes | (grepl("DT$|DTM$|TM$", variable) & !is.na(format)),
      "_numeric",
      type.x
    ),
    type.y = if_else(is.na(type.y), type.x, type.y),
    type.y = tolower(type.y),
    type.y = if_else(type.y %in% characterTypes | (grepl("DTC$", variable) & is.na(format)), "_character", type.y),
    type.y = if_else(type.y %in% numericTypes, "_numeric", type.y)
  )

I don鈥檛 think we should use the same list of character types for R data type and metadata type for date(time) variables. When the variable is a numeric date the R data types is "posixct", "posixt", "date" or "time". When the metadata data type is "date", "datetime", "time", then it is a character variable. Considering this difference, we should not use the same values list to convert R data type and metadata data type to numeric or character.
What are your thoughts about that?

@cpiraux cpiraux linked a pull request Jan 10, 2024 that will close this issue
14 tasks
EeethB added a commit that referenced this issue Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working programming
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants