Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Instant parsing in common module #106

Closed
wants to merge 3 commits into from
Closed

Implement Instant parsing in common module #106

wants to merge 3 commits into from

Conversation

okarmazin
Copy link

@okarmazin okarmazin commented Apr 3, 2021

Resolves #56

Instant.parse now supports:

  • Time zone offsets in the form of +-hh:mm and +-hhmm
  • Time part allows the omission of colons (allows hhmmss)
  • Time part allows the omission of seconds

Implementation notes:
Input string is split by the time delimiter (T|t) into date and time parts.
Date parsing is delegated to LocalDate.parse
Time parsing employs the well known algorithm from Iso8601Utils.java
originally implemented in Jackson. This commit is based on Moshi's
version of that file.

Motivation

Any datetime library worth its salt must be able to parse numeric time zone offsets. I'm afraid here's no way around it.
I initially implemented this for myself since I need the ability to parse numeric time zone offsets. In fact I think that most of the OffsetDateTime-type strings floating around the Internet come with the +-hh:mm offset representation rather than the Z designator. Migrating from ThreeTenBP I was surprised that only the Z designator was supported.

This implementation may hold fort until a format builder is added (#58 #39)

Discussion points:

The leniency of the time parser comes free with the algorithm.

  1. Should the omission of colons in time be allowed?
  2. Should the omission of seconds be allowed?

I vote that the leniency be kept in the spirit of the "Be permissive on input, strict on output" rule of thumb.

@okarmazin okarmazin marked this pull request as draft April 3, 2021 19:34

/*
* The algorithm for parsing time and zone offset was adapted from
* https://github.com/square/moshi/blob/aea17e09bc6a3f9015d3de0e951923f1033d299e/adapters/src/main/java/com/squareup/moshi/adapters/Iso8601Utils.java
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please advise on the proper formulation of attribution that shall be added.

import kotlin.math.min
import kotlin.math.pow

internal fun parseInstantCommon(string: String): Instant = parseIsoString(string)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseInstantCommon facade may try multiple formats before giving up such as #83 if that gets implemented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the need for such extensibility arises, we have little enough code to be able to add it quickly.

Instant.parse now supports:
- Time zone offsets in the form of +-hh:mm
- Time part allows the omission of colons (allows hhmmss)
- Time part allows the omission of seconds

Implementation notes:
Input string is split by the time delimiter (T|t) into date and time parts.
Date parsing is delegated to LocalDate.parse
Time parsing employs the well known algorithm from Iso8601Utils.java
originally implemented in Jackson. This commit is based on Moshi's
version of that file.
val c = timePart[offset]
if (c != 'Z' && c != 'z' && c != '+' && c != '-') {
seconds = parseInt(timePart, offset, offset + 2).also { offset += 2 }
if (seconds > 59 && seconds < 63) { // https://github.com/Kotlin/kotlinx-datetime/issues/5
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is #5 applicable here, or should this be converted to idiomatic range check?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is applicable. That said, this line is strange in any case: the existing parser for LocalDateTime doesn't recognize leap seconds, so we have a disconnect between

        Instant.parse("2020-06-14T01:01:61Z")

successfully parsing and

        LocalDateTime.parse("2020-06-14T01:01:61")

failing.

Also, it should be noted Java Time's parser throws on seconds outside of [0; 60). Why should we accept leap seconds and why only in Instant and not in LocalDateTime?

@okarmazin okarmazin marked this pull request as ready for review April 3, 2021 19:57
Copy link
Collaborator

@dkhalanskyjb dkhalanskyjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for the PR. We are actually interested in making this change and wanted to do it before the library's next release.

We can't accept this PR as is, though, given that the proposed changes are inconsistent with how the other parsers behave (LocalDateTime in particular). Moreover, it initially looks like this change could be implemented in a much more simple manner by parsing ZonedDateTime (EDIT: looks like the behavior of OffsetDateTime is more consistent with how the modern JDK versions handle the parsing, thanks @ilya-g for the suggestion!) on JVM and JS and slightly editing the existing parser on Native, and it's unclear why it would be worth doing anything other than that.

import kotlin.math.min
import kotlin.math.pow

internal fun parseInstantCommon(string: String): Instant = parseIsoString(string)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the need for such extensibility arises, we have little enough code to be able to add it quickly.

val c = timePart[offset]
if (c != 'Z' && c != 'z' && c != '+' && c != '-') {
seconds = parseInt(timePart, offset, offset + 2).also { offset += 2 }
if (seconds > 59 && seconds < 63) { // https://github.com/Kotlin/kotlinx-datetime/issues/5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is applicable. That said, this line is strange in any case: the existing parser for LocalDateTime doesn't recognize leap seconds, so we have a disconnect between

        Instant.parse("2020-06-14T01:01:61Z")

successfully parsing and

        LocalDateTime.parse("2020-06-14T01:01:61")

failing.

Also, it should be noted Java Time's parser throws on seconds outside of [0; 60). Why should we accept leap seconds and why only in Instant and not in LocalDateTime?

indexOfNonDigit(timePart, offset + 1) // assume at least one digit
val parseEndOffset =
min(endOffset, offset + 9) // parse up to 9 digits
val fraction = parseInt(timePart, offset, parseEndOffset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this code deals with fractions with more than nine digits that are representable as nanoseconds is by truncating the extra digits towards zero—not even by rounding them. So,

Instant.parse("2020-06-14T01:01:59.1234567899Z")
// results in 2020-06-14T01:01:59.123456789Z

This is highly questionable, as, again, parsing LocalDateTime throws in such cases, as does parsing Instant in Java. This makes sense, given that silently losing the user input is highly undesirable. Maybe an argument could be made that rounding is a sensible solution here, but even then the reasoning would have to be applied consistently across the parsers we have.

timezone = TimeZone.UTC
} else if (timezoneIndicator == '+' || timezoneIndicator == '-') {
val timezoneOffset = timePart.substring(offset)
// 18-Jun-2015, tatu: Minor simplification, skip offset of "+0000"/"+00:00"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments have little meaning without the commit history, so they probably shouldn't be included here.

if (e.isJodaDateTimeParseException()) throw DateTimeFormatException(e)
throw e
}
actual fun parse(isoString: String): Instant = parseInstantCommon(isoString)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be implemented easily—and consistently with the other parsers—by parsing a ZonedDateTime OffsetDateTime and then converting that to Instant.

The same for the Java implementation.

concreteCharParser('.')
.chainSkipping(fractionParser(0, 9, 9)) // nanos
))
.chainIgnoring(concreteCharParser('Z').or(concreteCharParser('z')))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the other parsers are implemented via ZonedDateTime OffsetDateTime, there is no need to have the parser for Instant in the common code, and the only place that requires an explicit implementation is the native part. Then, it's probably easier to modify this parser at this line so that it accepts any ZoneOffset and not just Z (and propagate this change correspondingly).

@okarmazin
Copy link
Author

okarmazin commented Apr 5, 2021

Thank you for your review. Since this code is highly inconsistent with existing logic I'll close this PR down.

As you're already planning on making the change yourselves and have it done by the next release (this month is the target according to Ilya Gorbunov on Slack I see no reason for a third party to be a part of that process. I only knew about the April release, but not that it would include offset time zone parsing. I hope this bootleg PR reinforces the fact that accepting time zone offsets is critically important. I'm looking forward to the next release!

@dkhalanskyjb P.S. I suggest that #56 be put in the 0.2.0 milestone.

@okarmazin okarmazin closed this Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to parse instants with UTC offsets other than Z
2 participants