Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when POSIXct variable present #12

Closed
AARON-CLARK opened this issue Nov 8, 2021 · 5 comments
Closed

Bug when POSIXct variable present #12

AARON-CLARK opened this issue Nov 8, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@AARON-CLARK
Copy link

AARON-CLARK commented Nov 8, 2021

Hi,

Thanks for great work on Tplyr! My team and I have been using it for some time. Recently, we noticed a strange behavior whenever POSIXct class variables exists in the "pop data" below. I've created a minimally reproducible example using the cdisc pilot data. The issue goes away if you get rid of the POSIXct variable or if you convert it to character, which is not an ideal workaround. Wondering if you could take a look or let me know if I'm doing something wrong?

> library(dplyr)
> library(Tplyr)
> 
> # use as needed
> # setwd("path/to/files")
> 
> cdisc_adsl <- haven::read_xpt("data-raw/adsl.xpt")
> cdisc_adae <- haven::read_xpt("data-raw/adae.xpt")
> 
> # Add in POSIXct variable
> adsl2 <- cdisc_adsl %>%
+   mutate(fake_dttm = as.POSIXct("2019-01-01 10:10:10"), origin = "1970-01-01")
> 
> str(adsl2$fake_dttm) 
POSIXct[1:254], format: "2019-01-01 10:10:10" "2019-01-01 10:10:10" "2019-01-01 10:10:10" "2019-01-01 10:10:10" ...
>
> # Make sure TRT01P exists in ADAE
> adae2 <- cdisc_adae %>%
+   left_join(adsl2 %>% select(USUBJID, TRT01P), "USUBJID")
> 
> # Create table
> tp_obj <- Tplyr::tplyr_table(adae2, TRT01P) %>% 
+   Tplyr::set_pop_data(adsl2) %>%
+   Tplyr::add_layer(
+     group_count('Number of subjects with any event') %>% 
+       Tplyr::set_distinct_by(USUBJID) %>% 
+       Tplyr::set_denoms_by(TRT01P)
+   )  
> 
> tp_obj %>% Tplyr::build() # error 
Error in as.POSIXlt.character(x, tz, ...) : 
  character string is not in a standard unambiguous format

And my sessionInfo() is below. Notice I ran this on linux OS but my team has encountered the same issue on windows. Thanks, let me know if I can provide any other info!

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Tplyr_0.4.2       rstudioapi_0.13   dplyr_1.0.3       readxl_1.3.1     
[5] haven_2.3.1       r2rtf_0.3.1      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        tidyr_1.1.2       prettyunits_1.1.1 ps_1.5.0         
 [5] assertthat_0.2.1  rprojroot_2.0.2   digest_0.6.27     mime_0.9         
 [9] plyr_1.8.6        cellranger_1.1.0  R6_2.5.0          ggplot2_3.3.3    
[13] pillar_1.4.7      rlang_0.4.10      callr_3.5.1       config_0.3.1     
[17] desc_1.2.0        stringr_1.4.0     munsell_0.5.0     tinytex_0.29     
[21] shiny_1.6.0       compiler_4.0.3    httpuv_1.5.5      xfun_0.20        
[25] pkgconfig_2.0.3   pkgbuild_1.2.0    htmltools_0.5.1.1 insight_0.14.5   
[29] tidyselect_1.1.0  tibble_3.0.5      roxygen2_7.1.1    attempt_0.3.1    
[33] fansi_0.4.2       crayon_1.3.4      withr_2.4.1       later_1.1.0.1    
[37] grid_4.0.3        jsonlite_1.7.2    xtable_1.8-4      gtable_0.3.0     
[41] lifecycle_0.2.0   DBI_1.1.1         dockerfiler_0.1.4 magrittr_2.0.1   
[45] scales_1.1.1      cli_2.2.0         stringi_1.5.3     fs_1.5.0         
[49] promises_1.1.1    remotes_2.2.0     testthat_3.0.1    xml2_1.3.2       
[53] ellipsis_0.3.1    generics_0.1.0    vctrs_0.3.6       sjlabelled_1.1.8 
[57] tools_4.0.3       forcats_0.5.1     golem_0.3.1       glue_1.4.2       
[61] purrr_0.3.4       hms_1.0.0         processx_3.4.5    pkgload_1.1.0    
[65] fastmap_1.1.0     yaml_2.2.1        colorspace_2.0-0  gt_0.2.2         
[69] knitr_1.31        usethis_2.0.0    
@AARON-CLARK AARON-CLARK changed the title Bug when POSIXct variables present Bug when POSIXct variable present Nov 8, 2021
@elimillera
Copy link
Member

@AARON-CLARK Thanks for this issue! I've confirmed I can reproduce.

Issue is on line 811 of count.R, there is an all.equal function that looks to be misbehaving when it sees a posix variable. This doesn't seem to be an issue with Tplyr but the all.equal.POSIXct function. I can fix this be wrapping the all.equal in a try like so: if (!isTRUE(try(all.equal(pop_data, target), silent = TRUE))) {. If it errors out its definitly not equal. I've already tested it works fine with a dataframe with POSIX values and no pop_data.

@mstackhouse Not a super elegant solution but I think it would be a quick bug fix we could push up.

@elimillera elimillera added the bug Something isn't working label Nov 8, 2021
@mstackhouse
Copy link
Contributor

@elimillera I think this works as a hotfix, because I do see that this works fine if with the POSIC dates if we don't set the pop_data. That said, I'm slightly bothered that adsl2 and adae2 compare without an error outside of the build. For example:

> all.equal(adae2, adsl2)
  [1] "Names: 50 string mismatches"                                                                 
  [2] "Attributes: < Component “row.names”: Numeric: lengths (1191, 254) differ >"                  
  [3] "Length mismatch: comparison on first 51 components"                                          
  [4] "Component “STUDYID”: Lengths (1191, 254) differ (string compare on first 254)"               
  [5] "Component 2: Attributes: < Component “label”: 1 string mismatch >"                           
  [6] "Component 2: Lengths (1191, 254) differ (string compare on first 254)"  

I know I'm not replicating some of the pre-processing, but I'm assuming something is processing during the build that then triggers this.

Would using identical() work here instead? Because the logic works such that if pop_data is not specified, then pop_data <- target, so would that break anything that you can think of?

Based on that flow:

>  x <- mtcars
>  y <- x
>  identical(x, y)
[1] TRUE

But this will break if target is modified after pop_data is bound as target:

> y <- y %>% mutate(test = TRUE)
> identical(x, y)
[1] FALSE

@AARON-CLARK
Copy link
Author

Hi @elimillera, thanks for the discussion on the hot fix above. Any chance this will get implemented soon? Thanks again!

elimillera pushed a commit that referenced this issue Dec 6, 2021
…for IBMRounding(#14)

fix bug with posix columns(#12) and add documentation for IBMRounding(#14)

Related work items: #14
@elimillera
Copy link
Member

Hey @AARON-CLARK, This has been updated in our latest push. We'll be pushing out to CRAN today but you can get the update now with remotes::install_github("atorus-research/Tplyr")

@AARON-CLARK
Copy link
Author

Excellent! Thanks @elimillera

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants