Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing single digit months #24

Closed
OmarGonD opened this issue Oct 24, 2016 · 12 comments
Closed

Parsing single digit months #24

OmarGonD opened this issue Oct 24, 2016 · 12 comments

Comments

@OmarGonD
Copy link

OmarGonD commented Oct 24, 2016

I wanted to answer this question on StackOverFlow, with "anytime".

This are the dates the have, the format is: "single digit month, day, full year"

2/10/2016  
4/4/2016  
5/8/2016  
10/1/2016

However, anydate() only works on the first entry, not the rest. In the PDF at CRAN, we find:

Issues
The Boost Date_Time library cannot parse single digit months or days. So while ‘2016/09/02’
works (as expected), ‘2016/9/2’ will not. Other non-standard formats may also fail.
The is a known issue (discussed at length in issue tick 5) where Australian times are off by an hour.
This seems to affect only Windows, not Linux.

So, apparently this is a known bug/issue. Are there anywork arounds?

Or we should just use as.POSIXct(df$final_date, format = "%d/%m/%Y") ?

anydate("2/10/2016")
[1] "2016-02-10"
anydate("4/4/2016")
[1] NA
anydate("5/8/2016")
[1] NA
anydate("10/1/2016")
[1] NA

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Spanish_Peru.1252  LC_CTYPE=Spanish_Peru.1252   
[3] LC_MONETARY=Spanish_Peru.1252 LC_NUMERIC=C                 
[5] LC_TIME=Spanish_Peru.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] anytime_0.0.4 dplyr_0.5.0  

loaded via a namespace (and not attached):
[1] lazyeval_0.2.0  magrittr_1.5    R6_2.1.3       
[4] assertthat_0.1  rsconnect_0.4.3 DBI_0.5        
[7] tools_3.3.1     tibble_1.2      Rcpp_0.12.7
@eddelbuettel
Copy link
Owner

eddelbuettel commented Oct 24, 2016

It is a shortcoming in Boost Date_Time. It just doesn't do it:

R> anytime:::testFormat("%m/%d/%Y", "10/11/2016")
[1] "2016-10-11 CDT"
R> anytime:::testFormat("%m/%d/%Y", "2/3/2016")
[1] NA
R> 

Sadly, there is nothing we can do here. My wording is a little off on the help page but this is documented. So I'll close this, ok?

The "good news" that base R does parse it:

R> as.Date("2/3/2016")
[1] "2-03-20"
R> 

@OmarGonD
Copy link
Author

Yes, sure. No problem.

@eddelbuettel
Copy link
Owner

Trust me, I find it annoying as hell too. The hope for anytime was to parse all sane formats. But arguably single digit is not one. You could write a regexp which transform a single digit month or day to two and then parse that...

@alexanu
Copy link

alexanu commented Sep 20, 2018

Thank you for explanation. I'm having the same problem, but with hours (it doesn't recognize e.g 9:15, but does 09:15).

@eddelbuettel
Copy link
Owner

Try setting the R parser which should do it:

R> anytime("2018-9-20 7:24:30", useR=TRUE)
[1] "2018-09-20 07:24:30 CDT"
R> 

It's too bad the Boost one doesn't by default but such is life.

@alexanu
Copy link

alexanu commented Sep 20, 2018

Thank you for quick reply. Unfortunately it didn't work out for me from the 1st attempt. I didn't want to dig why, so used lubridate::parse_date_time2("dmY HM") - was also pretty fast for 2mn rows.

@eddelbuettel
Copy link
Owner

lubridate, under the competition, also switched to a C++-based backend. Sadly by copying the library code I had already in the RcppCCTZ package. And out main difference is ease-of-use / absence of format requirement. If you bother with a format, you can just to strptime() in base R...

@alexanu
Copy link

alexanu commented Sep 20, 2018

:) thank you! you are doing really cool stuff. Unfortunately, I still do not understand all your packages, but maybe one day :)

@eddelbuettel
Copy link
Owner

Good news - this now works in master (and hence the next release probably at the end of the month):

 R> inp <- c("2/10/2016", "4/4/2016", "5/8/2016", "10/1/2016")
 R> library(anytime)
 R> anytime(inp)
 [1] "2016-02-10 CST" "2016-04-04 CDT" "2016-05-08 CDT"
 [4] "2016-10-01 CDT"
 R>

@grigory93
Copy link

grigory93 commented Nov 8, 2020

Unfortunately, it won't still work for the strings containg single digit hour like this 9/27/2017 9:00 resulting in 2017-09-27 00:00:00

@eddelbuettel
Copy link
Owner

There are no claims in the documentation that incomplete or non-standard formats are supported.

The gold standard is still ISO 9601 so hell yeah to four digit years, two digit days, two digit hours, years-before-month-before-day, ... and so on. We rely on a standard parser from Boost (plus for good measure R's own if you want). If you have exotic or likely error-prone formats you may to do something custom.

@eddelbuettel
Copy link
Owner

eddelbuettel commented Nov 8, 2020

The lack of seconds hurts as well. If you add them at least the R parser copes:

R> library(anytime)
R> anytime("9/27/2017 9:00:00 ", useR=TRUE)
[1] "2017-09-27 09:00:00 CDT"
R> anytime("9/27/2017 9:00:00 ")
[1] "2017-09-27 CDT"
R> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants