Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow less than expected digits when parsing time with decimals #5317

Merged
merged 4 commits into from
Dec 10, 2017

Conversation

bcardiff
Copy link
Member

This PR allows Time.parse to use less digits in the fraction part than the specified with the %N or %L options

So Time.parse("2016-09-09 17:03:28.456789", "%F %T.%N") will mean 456789000 nanoseconds. Before this PR the meaning was 456789 which does not correspond with the decimals semantic.

Without this PR parsing dates with 1, 2, 6 digits (or other than 3 or 9) was not straight. The user would need to pad 0 before parsing. I came through this while parsing datetime from mysql which have at most 6 digits.

I don't think that making %N or %L options relaxed in the amount of digits to parse is problematic. The other path would be to have modifiers with strict / flexible decimals for parsing, yet always strict for printing.

The first specs might look a bit weird since parsing 1 as microseconds should be treated as 100 microseconds since the %L modifier should be used after a decimal separator. Similar 1 as nanoseconds should be treated as 100000000 nanoseconds.

@RX14
Copy link
Contributor

RX14 commented Nov 22, 2017

I'm not sure that everyone will expect %L and %N will expect the input to be padded. Surely if we're dealing with decimals, we should communicate to the parser we want a decimal seconds value, because that's what we're really parsing, not an integer seconds, followed by a dot, followed by an integer nanoseconds.

Our time format is similar to standard unix time format, what do they do?

@straight-shoota
Copy link
Member

straight-shoota commented Nov 22, 2017

Currently, %L and %N are probably only considered to be used as fraction of a second. So, if padding is supported, 1234 would mean 123400 microseconds / 123400000 nanoseconds.
But both nano- and microseconds can also be interpreted as independent integer fields. In this case - if padding is supported - 1234 would mean 1234 micro-/nanoseconds.

If the length is fixed to maximum precision, both can be used with the same format. If reduced precision should also be supported, we'll probably need two different formats.

For the fraction variant it should also be possible to define the number of digits (i.e. precision).
As far as I am aware of, most date formats allow arbitrary precision of nanosecond fractions. For example according to ISO 8601 it should have as many digits as necessary following the decimal sign.

My suggestion is to have (A) %L and %N accept an arbitrary number of digits (it already truncates if the precision is higher than the mirco-/nanosecond) and (B) allow to pass a number to describe a fixed amount of digits: %4N for example.
And (C) there should also be additional directives for micro-/nanoseconds as integer values.

This PR would implement (A). (B) and (C) should follow, but are probably not too important.

@RX14
Copy link
Contributor

RX14 commented Nov 22, 2017

Python uses %f and zero-pads on the right, just like our %L. It seems rather annoying to have entirely seperate format codes for each language.

Ruby uses %L for milliseconds, and has %3N, %6N and %9N for milliseconds, microseconds and nanoseconds respectively. It similarly pads zeroes on the right to handle decimals correctly.

I suggest: %N stays and is arbitrary precision (i.e. it consumes all digits, but only parses however many is the max precision of Time right now), %L is either removed or made an alias of %N, %xN only parses up to x digits. We could also possibly introduce another format code for left-padded nanoseconds instead of right-padded annoseconds, but I think that usecase is pretty rare.

@bcardiff
Copy link
Member Author

I don't find a use case for parsing %N or %L as integer entities inside a time/date time string.

Implementing %L = %3N, %6N, %9N = %N as in ruby seems reasonable but it would be a refactor, feature addition on top of the parser and the discussion of parsing the fraction with less than expected digits is independent.

I could take care of adding %xN versions, but it could be done in a separate PR.

@straight-shoota
Copy link
Member

It is not very common to use nanoseconds as integer. I've just seen an example of dates being formatted as 2017-11-22 20:23:10 123812302 (dunno where it was, probably somewhere on stackoverflow.com). But there is certainly no pressure for this.

With all these formats with different precision, it is clear how the formatter should behave. But what about the parser? Should it complain if it gets more or less than 3, 6, or 9 digits?

I don't know if there are date formats that demand for a specific number of digits.

@bcardiff
Copy link
Member Author

I've just pushed the handling of 3N, 6N, 9N with some refactors mainly in the parser. All the options allow less digits when parsing and pad when formatting.

@straight-shoota
Copy link
Member

I'm not sure if the formats with reduced precision should just truncate additional digits. Parsing 321999 with %3N as 321000000 looks wrong.

If the number of digits does not match, it should either be stric and fail or allow more digits than expected but recognize their value. I think it would be safe to let all of them consume as much digits as there are available. The only significance should be in the formatter.

@bcardiff bcardiff force-pushed the feature/fewer-digits-time-parser branch from 1d4b5a8 to a41c32c Compare November 28, 2017 12:56
@bcardiff
Copy link
Member Author

bcardiff commented Nov 28, 2017

@straight-shoota you are right, it looks weird. And ruby does the same, all the digits are preserved.
I rebased on master, add the missing docs and make the parser always read nanoseconds.

second_decimals 9
end

private def second_decimals(precision)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unnecessary when all behave the same.

@bcardiff bcardiff force-pushed the feature/fewer-digits-time-parser branch from a41c32c to 7f164fe Compare November 28, 2017 13:30
@RX14
Copy link
Contributor

RX14 commented Nov 28, 2017

I think this is bad: in formats such as 20170101 (jan 1st 2017) you can get numbers which are next to each other. %3N should consume exactly 3 digits. If we have an arbitrary precision consume as many digits as you can that should be %N with no restriction.

@bcardiff
Copy link
Member Author

@RX14 I think that won't happen in a real use case. Also, nano and microsecond parsing was already consuming 12 consecutive digits.

Maybe we could detect at runtime some invalid patterns: Some delimiter char or non digit pattern should follow a %L, %[3,6,9]N.

@RX14
Copy link
Contributor

RX14 commented Nov 28, 2017

@bcardiff it might be rare but I don't see why we shouldn't support it. We should support parsing 20170101 and so it seems odd not to support this.

@bcardiff
Copy link
Member Author

ruby did it again 🎩 . It parses %L with additional decimals if it is at the end, but if used at the beginning it is strict.

irb(main):003:0> Time.strptime("4567891239992016-09-09T17:03:28", "%L%FT%T").strftime("%FT%T.%N")
=> "7891239992016-09-09T17:03:28.456000000"
irb(main):004:0> Time.strptime("2016-09-09T17:03:28.456789123999", "%FT%T.%L").strftime("%FT%T.%N")
=> "2016-09-09T17:03:28.456789123"

I would defer the border case for other PR. I think that improving the parsing with this PR is valuable as is given the precision of nanoseconds is not always present in string representation of times.

@RX14
Copy link
Contributor

RX14 commented Nov 28, 2017

There shouldn't be a border case: we should always be strict. I don't see why merging something which could so easilly be improved is a good idea?

@straight-shoota
Copy link
Member

@RX14 With be strict do you mean to read only as many digits as the format describes and fail if there are more (and can't be consumed by the next pattern)?
I think we can do that for %3N, %6N, %9N and let %N read an arbitrary amount of decimal digits.

@RX14
Copy link
Contributor

RX14 commented Nov 28, 2017

@straight-shoota basically.

Although, if we don't currently fail if there's left-over pattern, we shouldn't change that in this PR.

@bcardiff bcardiff force-pushed the feature/fewer-digits-time-parser branch from 7f164fe to f7c9287 Compare December 8, 2017 15:56
@bcardiff
Copy link
Member Author

bcardiff commented Dec 8, 2017

Now %xN wont consume more than x digits. So if can be together with other numeric patterns.
Also %xN allows less than x digits.
%N will parse nanoseconds, but will consume all consecutive digits. Even without max 12 digits assumption that was present before this PR.

Rebased on master. I would merge it without squashing if approved.

@RX14
Copy link
Contributor

RX14 commented Dec 8, 2017

I think it's better off being squashed though: it's one logical change.

@RX14 RX14 merged commit 490f491 into crystal-lang:master Dec 10, 2017
@RX14 RX14 added this to the Next milestone Dec 10, 2017
@straight-shoota
Copy link
Member

It seems like %L vanished from the documentation... I don't think that was on purpose? @bcardiff

@straight-shoota
Copy link
Member

Ah no, I see it was merged in the same entry with %3N. I don't think that's optimal for accessibility.

Also, I'm think %L was supposed to behave the same way as %N in accepting more than 3 digits making it different from %3N anyway.

Additionally, the N-formats are not in correct alphabetical order (though that has been the case for %N since it was introduced).

@bcardiff bcardiff deleted the feature/fewer-digits-time-parser branch May 31, 2018 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants