Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keepLeadingZeros interferes with date recognition #4869

Open
rmcd1024 opened this issue Jan 9, 2021 · 3 comments
Open

keepLeadingZeros interferes with date recognition #4869

rmcd1024 opened this issue Jan 9, 2021 · 3 comments

Comments

@rmcd1024
Copy link

rmcd1024 commented Jan 9, 2021

keepLeadingZeros seems to interfere with automatic date recognition. The date is recognized as such only when keepLeadingZeros=FALSE. If this is by design I couldn't find it documented.

Compare the output of the these two lines:

str(fread("date, id\n2020-01-05,  00001\n2020-01-10,  00003",  keepLeadingZeros = TRUE ))
Classes ‘data.table’ and 'data.frame':	2 obs. of  2 variables:
 $ date: chr  "2020-01-05" "2020-01-10"
 $ id  : chr  "00001" "00003"
 - attr(*, ".internal.selfref")=<externalptr> 

str(fread("date, id\n2020-01-05,  00001\n2020-01-10,  00003",  keepLeadingZeros = FALSE))
Classes ‘data.table’ and 'data.frame':	2 obs. of  2 variables:
 $ date: IDate, format: "2020-01-05" "2020-01-10"
 $ id  : int  1 3
 - attr(*, ".internal.selfref")=<externalptr> 

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.6

loaded via a namespace (and not attached):
[1] compiler_4.0.3 tools_4.0.3    yaml_2.2.1 
@jangorecki
Copy link
Member

@MichaelChirico

@Feakster
Copy link

Just came here to report the same issue in data.table version 1.13.6

Problem can be replicated using one of UK Biobank's schemas:

keepLeadingZeros = FALSE

dt <- data.table::fread(
  input = "https://biobank.ndph.ox.ac.uk/showcase/scdown.cgi?fmt=txt&id=8",
  sep = "\t",
  quote = "",
  header = TRUE,
  na.strings = c("", "NA"),
  keepLeadingZeros = FALSE
)
str(dt)
'data.frame':	10 obs. of  4 variables:
 $ encoding_id   : int  272 586 586 586 819 819 819 819 819 1313
 $ value         : IDate, format: "1900-01-01" "1910-01-01" "1920-01-01" ...
 $ meaning       : chr  "Date is unknown" "Prefer not to answer" "Do not know" "Not applicable" ...
 $ showcase_order: int  0 1 2 3 1 2 3 4 5 1

keepLeadingZeros = TRUE

dt <- data.table::fread(
  input = "https://biobank.ndph.ox.ac.uk/showcase/scdown.cgi?fmt=txt&id=8",
  sep = "\t",
  quote = "",
  header = TRUE,
  na.strings = c("", "NA"),
  keepLeadingZeros = TRUE
)
str(dt)
Classes ‘data.table’ and 'data.frame':	10 obs. of  4 variables:
 $ encoding_id   : int  272 586 586 586 819 819 819 819 819 1313
 $ value         : chr  "1900-01-01" "1910-01-01" "1920-01-01" "1930-01-01" ...
 $ meaning       : chr  "Date is unknown" "Prefer not to answer" "Do not know" "Not applicable" ...
 $ showcase_order: int  0 1 2 3 1 2 3 4 5 1
 - attr(*, ".internal.selfref")=<externalptr> 

sessionInfo() output:

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Manjaro Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.13.so
LAPACK: /usr/lib/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ukbschemas_0.3.0 testthat_3.0.1  

loaded via a namespace (and not attached):
 [1] xfun_0.20         purrr_0.3.4       vctrs_0.3.6       blob_1.2.1        rlang_0.4.10     
 [6] pkgbuild_1.2.0    pillar_1.4.7      glue_1.4.2        withr_2.4.1       DBI_1.1.1        
[11] bit64_4.0.5       lifecycle_0.2.0   stringr_1.4.0     commonmark_1.7    memoise_2.0.0    
[16] knitr_1.31        callr_3.5.1       fastmap_1.1.0     ps_1.5.0          parallel_4.0.3   
[21] Rcpp_1.0.6        readr_1.4.0       cachem_1.0.3      desc_1.2.0        pkgload_1.1.0    
[26] bit_4.0.4         hms_1.0.0         digest_0.6.27     stringi_1.5.3     processx_3.4.5   
[31] rprojroot_2.0.2   cli_2.3.0         tools_4.0.3       magrittr_2.0.1    tibble_3.0.6     
[36] RSQLite_2.2.3     crayon_1.4.1      pkgconfig_2.0.3   ellipsis_0.3.1    data.table_1.13.6
[41] xml2_1.3.2        prettyunits_1.1.1 assertthat_0.2.1  roxygen2_7.1.1    rstudioapi_0.13  
[46] R6_2.5.0          compiler_4.0.3

@r2evans
Copy link

r2evans commented Feb 23, 2022

Not just Date, this affects POSIXct as well. From https://stackoverflow.com/a/71241926/3358272,

withr::with_options(
  list(datatable.keepLeadingZeros=FALSE), 
  fread(text=c("now","2020-07-24T10:11:12.134Z"), sep=",")
)
#                        now
#                     <POSc>
# 1: 2020-07-24 10:11:12.134

withr::with_options(
  list(datatable.keepLeadingZeros=TRUE), 
  fread(text=c("now","2020-07-24T10:11:12.134Z"), sep=",")
)
#                         now
#                      <char>
# 1: 2020-07-24T10:11:12.134Z

(Wish I had found this issue before ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants