Milliseconds are not parsed for certain formats #29

Closed
bobjansen opened this Issue Oct 30, 2016 · 26 comments

Projects

None yet

2 participants

@bobjansen
Contributor
bobjansen commented Oct 30, 2016 edited

When I run (from a fresh install from CRAN and when running on my version of master)

options(digits.secs = 6)
dput(anytime(c("20160901 101112.345678", "01Sep2016 101112.345678")))
structure(c(1472688672.34568, 1472688672), 
          class = c("POSIXct","POSIXt"), tzone = "Australia/NSW")

the milliseconds seem not to be parsed. Something that doesn't seem to happen in tests/allFormats.Rout.save.

@bobjansen bobjansen changed the title from Milliseconds are parsed for certain formats to Milliseconds are not parsed for certain formats Oct 30, 2016
@eddelbuettel
Owner
eddelbuettel commented Oct 30, 2016 edited

Thanks. There must be a %F or %f missing in the formats vector. Do you have time to chase it?

Easier to see this way:

R> anytime(c("20160101 101112.345678", "01Sep2016 101112.345678"))
[1] "2016-01-01 10:11:12.345678 CST" "2016-09-01 10:11:12.000000 CDT"
R> 

(and tests/allFormats.Rout.save is stale/static/not updated)

@bobjansen
Contributor

I'm on it. I'll send a pull ASAP.

@eddelbuettel
Owner

Wicked. Was thinking about rolling a new release, so this will make it even better!

@eddelbuettel
Owner

"That format" is probably also missing in tests/allFormats.R or we would have eye-balled it.

So my bad...

@bobjansen
Contributor

I have this branch now: https://github.com/bobjansen/anytime/tree/runit which allows some quicker experimentation and found that this issue is more annoying than I first thought: the format check is greedy so that we have:

> anytime:::testFormat("%d%b%Y %H:%M:%S", "01Sep2016 101112")
[1] "2016-09-01 10:11:00 AEST"

which is wrong but also

> dput(anytime:::testFormat("%d%b%Y %H%M%S", "01Sep2016 101112.345678"))
structure(1472688672, class = c("POSIXct", "POSIXt"), tzone = "")  # Too greedy
> dput(anytime:::testFormat("%d%b%Y %H%M%S%f", "01Sep2016 101112.345678"))
structure(1472688672.34568, class = c("POSIXct", "POSIXt"), tzone = "")

which makes format ordering a bit complicated.

@bobjansen
Contributor
@eddelbuettel
Owner
eddelbuettel commented Oct 30, 2016 edited

I may know what that is. It is an actual error.

Which I caught for dates, but not for dates with subsequent %H:%M:%S.

See this: https://github.com/eddelbuettel/anytime/blob/master/src/anytime.cpp#L213-L214

We may need to (very carefully) generalise that to an || substr(....)

@eddelbuettel
Owner

Ie this is also just plain wrong:

R> anytime("20161030 121647")
[1] "2016-03-14 16:00:00 CDT"
R>

๐Ÿ˜ข

@eddelbuettel
Owner

It is a terrible format but still ...

@bobjansen
Contributor
bobjansen commented Oct 30, 2016 edited

I actually like the parsimony of "20161030 121647". Luckily for us it's hardly used (IME) and I guess we could do without support for "20161030 101112.345678" if nothing else works.

@eddelbuettel
Owner

If I sit down with Boost regexp I can probably cover it.

@eddelbuettel
Owner

Grr. Cannot. Boost regexp requires linking which we don't have. Regexp is part of C++11 so I may just adopt that as a minimal standard. I am a bit tied up til the end of next week so this is on hold.

@eddelbuettel
Owner

Better news. Try the brand new branch. It should now cover

     20161103
     20161103 1011
     20161103 101112
     20161103 101112.131415
@eddelbuettel
Owner

That took one more refinement but I currently do see any failures in the tests we have.

Which does of course mean that we may need more tests :)

@bobjansen
Contributor

The examples above work for me but

anytime(c("01Sep2016 101112",     "01Sep2016 101112.345678"))
anytime(c("01Sep2016 10:11:12",   "01Sep2016 10:11:12.345678"))
anytime(c("01-Sep-2016 101112",   "01-Sep-2016 101112.345678"))

still give me trouble, milliseconds are ignored. Do these work on your machine?

@eddelbuettel
Owner

In the meantime I fixed one more mistake, but good catch on your part. I now that section dropping out:

edd@max:~/git/anytime(feature/all_digits)$ Rscript tests/allFormats.R
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12 CDT" "2016-09-01 10:11:12 CDT"
[1] "2016-09-01 10:11:12 CDT" "2016-09-01 10:11:12 CDT"
[1] "2016-09-01 10:11:00 CDT" "2016-09-01 10:11:12 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT"

[1] "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT"

[1] "2016-09-01 CDT" "2016-09-01 CDT"

[1] "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT"

[1] NA                        NA                        NA                        "2016-09-01 10:11:12 CDT"

[1] "2016-09-11 00:00:00.000000 CDT" "2016-09-11 10:11:00.000000 CDT" "2016-09-11 10:11:12.000000 CDT" "2016-09-11 10:11:12.345678 CDT"
edd@max:~/git/anytime(feature/all_digits)$ 

and I guess you quite correctly put the finger on the three lines where we drop fractional seconds.

Will investigate on the train home.

@eddelbuettel
Owner
eddelbuettel commented Nov 4, 2016 edited

Hehe, that was unrelated -- the format strings were lacking the required %f.

@eddelbuettel eddelbuettel added a commit that referenced this issue Nov 5, 2016
@eddelbuettel richer formats: fractional seconds for %b datetime
with thanks to Bob Jansen in #29 (comment)
697be1d
@eddelbuettel
Owner

@bobjansen I just pushed two more commits (with the first explicitly thanking you for the note above). Please give that a whirl. I may release it tomorrow or Monday or Tuesday -- it will bring a lot of nice new stuff.

@bobjansen
Contributor

Thanks. And the new features look very useful.

I gave it a whirl but now it does this:

> anytime(c("01-Sep-2016 101112",   "01-Sep-2016 101112.345678"))
[1] "2016-09-01 10:11:00.000000 CEST" "2016-09-01 10:11:12.345678 CEST"

I think I have fix for our current set of tests though, see this tiny change. I will try to do some further testing in the coming days.

@eddelbuettel
Owner

I would argue that "01-Sep-2016 101112" is broken.

It simply is not in the list of formats. YYYYMMDD is bad enough, yet common. As we felt gracious (just kidding) we added HHMMSS to it. But what you have here is ... out of spec.

Now, you rightly realized that, and added the missing format. But as discussed in the passed, added formats is not "entirely free" as it may affect other parses. So I would try to convince you to let go of this one. What do you think? Is it really a must-have?

@bobjansen
Contributor

We have addFormats so I'd say no and I agree with your reasoning but then let's also completely remove it from tests/allFormats.R. Currently, users might believe it is being handled correctly.

@eddelbuettel
Owner
eddelbuettel commented Nov 5, 2016 edited

Oops. Good catch. If they are in tests/allFormats.R then they must once have worked .. and hence should work. That is a vote in favour of your suggested change.

And I just noticed that lines 3 and 4 of the tests no longer do subseconds. Dang.

Looks the splitting of the times needs to be more careful about excluding parts that are not just digits.

@eddelbuettel
Owner

Really appreciate the careful checking.

I just made one more commit; it looks good at my end now:

edd@max:~/git/anytime(master)$ Rscript tests/allFormats.R
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:00.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"
[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 10:11:12.000000 CDT" "2016-09-01 10:11:12.345678 CDT"

[1] "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT"

[1] "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT"

[1] "2016-09-01 CDT" "2016-09-01 CDT"

[1] "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT" "2016-09-01 CDT"

[1] NA                        NA                        NA                        "2016-09-01 10:11:12 CDT"

[1] "2016-09-11 00:00:00.000000 CDT" "2016-09-11 10:11:00.000000 CDT" "2016-09-11 10:11:12.000000 CDT" "2016-09-11 10:11:12.345678 CDT"
edd@max:~/git/anytime(master)$ 
@bobjansen
Contributor
bobjansen commented Nov 6, 2016 edited

Looks good to me too!

Edit: I do need to have 2fdfcc merged for "01-Sep-2016 101112" to work though.

@eddelbuettel
Owner

Yes -- I just cherry-picked that one in. We should be good now.

@eddelbuettel
Owner

It is really unbelievable. I just caught another special case (of the dreaded HHMMSS). We should be good now. Commit coming in a minute, and I added you to ChangeLog as well.

@bobjansen bobjansen added a commit to bobjansen/anytime that referenced this issue Nov 19, 2016
@eddelbuettel @bobjansen + bobjansen richer formats: fractional seconds for %b datetime
with thanks to Bob Jansen in eddelbuettel#29 (comment)
fb4a9e4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment