Apparently anytime() not handling July in "%d-%b-%Y" format #33

Closed
shoebodh opened this Issue Nov 30, 2016 · 18 comments

Projects

None yet

3 participants

@shoebodh

Here is an an interesting issue I had with anytime()
consider this

library(anytime)
dtchar = c( "30-Jun-2014 23:30:00", "30-Jun-2014 23:45:00", "01-Jul-2014 00:00:00", "01-Jul-2014 00:15:00", "01-Jul-2014 00:30:00")

Here is what anytime () gave to me

anytime(dtchar)
 [1] "2014-06-30 18:30:00 EDT" "2014-06-30 18:45:00 EDT"
 [3] NA                        NA                       
 [5] NA

Not sure if this a but but it seems to me that in this format ("%d-%b-%Y %H:%M:%S"), anytime () is returning NA for the month of July
Testing with all the twelve months shows this

dtchar2 = c("01-Jan-2014 00:30:00", "01-Feb-2014 00:30:00","01-Mar-2014 00:30:00",
       "01-Apr-2014 00:30:00","01-May-2014 00:30:00", "30-Jun-2014 23:30:00",
  "30-Jul-2014 23:45:00", "01-Aug-2014 00:00:00", "01-Sep-2014 00:15:00",
 "01-Oct-2014 00:30:00", "01-Nov-2014 00:30:00", 
 "01-Dec-2014 00:30:00")

anytime(dtchar2)
[1] "2014-01-01 00:30:00 EST" "2014-02-01 00:30:00 EST"
 [3] "2014-03-01 00:30:00 EST" "2014-04-01 00:30:00 EDT"
 [5] "2014-05-01 00:30:00 EDT" "2014-06-30 23:30:00 EDT"
 [7] NA                        "2014-08-01 00:00:00 EDT"
 [9] "2014-09-01 00:15:00 EDT" "2014-10-01 00:30:00 EDT"
[11] "2014-11-01 00:30:00 EDT" "2014-12-01 00:30:00 EST"

while the base as.POSIXct() handles it correctly

as.POSIXct(dtchar2, format = "%d-%b-%Y %H:%M:%S")
[1] "2014-01-01 00:30:00 EST" "2014-02-01 00:30:00 EST"
 [3] "2014-03-01 00:30:00 EST" "2014-04-01 00:30:00 EDT"
 [5] "2014-05-01 00:30:00 EDT" "2014-06-30 23:30:00 EDT"
 [7] "2014-07-30 23:45:00 EDT" "2014-08-01 00:00:00 EDT"
 [9] "2014-09-01 00:15:00 EDT" "2014-10-01 00:30:00 EDT"
[11] "2014-11-01 00:30:00 EDT" "2014-12-01 00:30:00 EST"
@eddelbuettel
Owner
eddelbuettel commented Nov 30, 2016 edited

Works here:

R> dtchar <- c( "30-Jun-2014 23:30:00", "30-Jun-2014 23:45:00", "01-Jul-2014 00:00:00", 
+               "01-Jul-2014 00:15:00", "01-Jul-2014 00:30:00")
R> dtchar
[1] "30-Jun-2014 23:30:00" "30-Jun-2014 23:45:00" "01-Jul-2014 00:00:00" 
[4] "01-Jul-2014 00:15:00" "01-Jul-2014 00:30:00"
R> anytime(dtchar)
[1] "2014-06-30 23:30:00 CDT" "2014-06-30 23:45:00 CDT" "2014-07-01 00:00:00 CDT" 
[4] "2014-07-01 00:15:00 CDT" "2014-07-01 00:30:00 CDT"
R> 

Do you have a locale set or something else that would do that?

@shoebodh

Ok, glad to report that the issue seems to be with Rstudio IDE not the package. From R command line it behaves correctly but within Rstudio it gives NA.
Thanks.

@shoebodh shoebodh closed this Nov 30, 2016
@eddelbuettel
Owner

What country / language are you in? Did you override anything?

@shoebodh
shoebodh commented Dec 1, 2016

I am on Eastern time zone. USA.
I don't recall doing anything that would change such settings internally.

My report was based on a Windos 7 (64-bit machine).
Now with fresh installs of both R and Rstudio, I have found out that the problem exists on the 64-bit R. And it's happening regardless of whether R is running within RStudio or on itself. But on the 32 bit R it works fine (both in and out of RStudio).

Obviously ,this seems to be specific to my machine because it works on yours.
Wonder what would prompt such a behavior.
For now I am using a workaround to solve this issue with the 64 bit R.

@eddelbuettel
Owner

Now that would be puzzling. I am on Central time, and I will try Windows tomorrow in the virtual machine I have at work.

Here, under Linux, it works in RStudio console and terminal as you;d expect.

@shoebodh
shoebodh commented Dec 1, 2016 edited

Ok. Now its confirmed that it's specific to my machine. Worked in another windows.
Now it's really puzzling ( just to me anyway). :)

Thanks

@shoebodh
shoebodh commented Dec 3, 2016

Hi there again,
I am sorry for not providing a more comprehensive test last time.
I tested the following code on 2 other windows and the problem seems to persist on all of them (though a bit differently than the first machine where I noticed this).

library(anytime)
dtchar = c("01-Jul-2015 00:15:00", "01-Jul-2015 00:30:00")
anytime(dtchar) ## This time it works as expected
## [1] "2015-07-01 00:15:00 EDT" "2015-07-01 00:30:00 EDT"
anytime(dtchar) ### note that I run the same command twice.
## [1] NA NA

So the puzzling thing here is that the first time you execute the command it works, the next time it doesn't (returns NA). Weird!
(In my last comment, I ran the anytime() function only once and it gave me the right datetime output. So I thought it was that single machine at my work, which had something going on. But later when I actually ran the code on my windows laptop, it worked only on the first execution. From second time onwards, it gave NA).

After restarting R, the same thing happens; first time the right output, next time NA

Also NA in the following
####Restart R

dtchar = c("01-Jul-2015 00:15:00", "01-Jul-2015 00:30:00")
anytime(toupper(dtchar)) ## This time it works as expected
## [1] "2015-07-01 00:15:00 EDT" "2015-07-01 00:30:00 EDT"
anytime(dtchar) ## Returns NA
## [1] NA NA

So the issue itself is not big because I've now used anytime() this way in my code. Nonetheless, its seems to be reproducible(at least on the windows machines I have tried so far(3 of them))

Here is the system info of the machine where wrote this document

Sys.info() ## And the System locale info
##           sysname           release           version          nodename 
##         "Windows"          "10 x64"     "build 14393" "SACHARYA-PC-DES" 
##           machine             login              user    effective_user 
##          "x86-64"          "Subodh"          "Subodh"          "Subodh"

Sys.getlocale()
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Now, on the previous versions, 32 bit R-3.3.2, and linux (167.04) it's working fine.

@eddelbuettel
Owner

Weird. I don't have access to Windows right now and will have to check next week.

We fixed something that may be related between anytime 0.1.0 and anytime 0.1.1 -- so just to check, you are on the new version, correct?

@shoebodh
shoebodh commented Dec 3, 2016
@eddelbuettel
Owner

@bobjansen In case you're around, can you try on your Windows box?

Apparently repeated calling of %b formats fails?

@shoebodh
shoebodh commented Dec 3, 2016

Seems to be real. A friend of mine got the same return values on his computer

@eddelbuettel
Owner

You guys would need to to debug this. Run anytime:::setDebug(TRUE) before the first call, compare what is printed at the first call with the subsequent ones. Something must go wrong somewhere but we need it to be reproducible...

@bobjansen
Contributor

@eddelbuettel I should be able to take a close look tomorrow (So in the next 16 hours).

@eddelbuettel
Owner
eddelbuettel commented Dec 3, 2016 edited

I can confirm via R Hub (and the excellent rhub package) that this fails as claimed on 64-bit windows. Weird.

โ”€  ** running tests for arch 'x64' ...
     Running 'gh_issue_12.R'
     Running 'gh_issue_33.R'
   Warning message:
   running command '"C:/PROGRA~1/R/R-33~1.2/bin/x64/R" CMD BATCH --vanilla  "gh_issue_33.R" "gh_issue_33.Rout"' had status 1 
    ERROR
   Running the tests in 'tests/gh_issue_33.R' failed.
   Last 13 lines of output:
     Type 'q()' to quit R.
     
     > library(anytime)
     > 
     > dtchar <- c("01-Jul-2015 00:15:00", "01-Jul-2015 00:30:00")
     > anytime(dtchar)
     [1] NA NA
     > anytime(dtchar)
     [1] NA NA
     > stopifnot(sum(!is.na(anytime(dtchar))) == 2)
     Error: sum(!is.na(anytime(dtchar))) == 2 is not TRUE
     Execution halted
     > 

File is simple:

library(anytime)

dtchar <- c("01-Jul-2015 00:15:00", "01-Jul-2015 00:30:00")
anytime(dtchar)
anytime(dtchar)
stopifnot(sum(!is.na(anytime(dtchar))) == 2)
stopifnot(sum(!is.na(anytime(dtchar))) == 2)
@eddelbuettel
Owner
eddelbuettel commented Dec 3, 2016 edited

Seems to be related to the string splitter:

** running tests for arch 'x64' ... ERROR
Running the tests in 'tests/gh_issue_33.R' failed.
Last 13 lines of output:
  In: 01-Ju out: 01-Ju and -2015
  s: 01-Ju len: 5 res: 0
  s: -2015 len: 5 res: 0
  One: 01-Ju two: -2015
  before parse: 01-Ju
  before tests: 01-Jul-2015 00:30:00
  In: 01-Ju out: 01-Ju and -2015
  s: 01-Ju len: 5 res: 0
  s: -2015 len: 5 res: 0
  One: 01-Ju two: -2015
  before parse: 01-Ju
  Error: sum(!is.na(anytime(dtchar))) == 2 is not TRUE
  Execution halted
* checking PDF version of manual ... OK

This is just wrong: In: 01-Ju out: 01-Ju and -2015.
It should be In: 01-Jul-2015 00:15:00 out: 01-Jul-2015 and 00:15:00. It is supposed to split on a space ie " ". Grr.

@eddelbuettel eddelbuettel reopened this Dec 4, 2016
@eddelbuettel
Owner

It's fixed -- using Boost to split strings works. Got that in earlier but had to leave before the test via rhub completed. See this commit for details.

PR to follow tomorrow.

@bobjansen
Contributor

I can at least confirm that with that commit it works on Windows 10 (build 14965) works in both RStudio as in standard R.

rhub also returns success on my runit branch: https://builder.r-hub.io/status/anytime_0.1.1.1.tar.gz-33f59686733542e7ab06023d8ae291a5

@eddelbuettel
Owner

Thanks.

I am glad I found the Boost String Algorithms and will take a peek getting maybe another function converted. We may get more robust code, yet still get by without a new dependency on either C++11 (for regexp) or a new library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment