New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom time format implementations #5123

Merged
merged 8 commits into from Jun 10, 2018

Conversation

Projects
None yet
4 participants
@straight-shoota
Contributor

straight-shoota commented Oct 15, 2017

The format patterns from Time::Format allow to define simple date formats but they are quite limited in flexibility. Many standardized formats include slight variations which must be taken into account for parsing such formats in a standards-compliant way.

This PR adds custom format implementations for some well-used time formats.
For a few it adds adds short-cut methods to Time class (like Time.parse_iso8601).

  • Time::Format::ISO_8601_DATETIME: Parser for standard time format with many variations. It should support most of the standard format including week dates, ordinal dates and optional separators. Uses RFC 3339 as formatter.
  • Time::Format::RFC_3339: More strict variation of ISO 8601, recommended for exchanging time data in computer systems.
  • Time::Format::RFC_2822: Format compatible to RFC 822 and RFC 1123 used for example in HTTP protocols.
  • Time::Format::HTTP_DATE: Proper implementation of HTTP.parse_date supporting time formats expressed as RFC 1123/RFC 2822, RFC 850 or asctime which are used by many HTTP protocols. Uses RFC_2822 as formatter.
  • Time::Format::YAML_DATE: Refactored implementation of parser for YAML datetime type.

Surpasses #5084, #4729

/cc @asterite @RX14

when Time::Kind::Local
time_zone_rfc2822
else
raise "invalid timezone"

This comment has been minimized.

@Sija

Sija Oct 15, 2017

Contributor

Capitalize exception message, please.

@straight-shoota straight-shoota force-pushed the straight-shoota:jm-time-formats-simplified branch 2 times, most recently from ef5042d to 216efdb Oct 15, 2017

@straight-shoota straight-shoota force-pushed the straight-shoota:jm-time-formats-simplified branch 4 times, most recently from 92e737f to be405fb Dec 14, 2017

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Dec 15, 2017

I've rebased this PR and refactored a bit. It would be great to get some feedback.

@straight-shoota straight-shoota force-pushed the straight-shoota:jm-time-formats-simplified branch from be405fb to a98da2a Dec 15, 2017

@straight-shoota straight-shoota force-pushed the straight-shoota:jm-time-formats-simplified branch from a98da2a to 575f4eb Mar 7, 2018

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Mar 7, 2018

I've rebased this PR to incorporate the Time::Location API added in #5324.

Requires #5786

@straight-shoota straight-shoota changed the title from Add custom time format implementations (simplified) to Add custom time format implementations Mar 7, 2018

@straight-shoota straight-shoota force-pushed the straight-shoota:jm-time-formats-simplified branch 3 times, most recently from 18fde5a to 1e1b76d Mar 7, 2018

@bcardiff

@straight-shoota

  1. may I ask to rebase this PR on master?

  2. if #5382 is merged we are mostly ok since we will have proper timezone parsing exception but adds a quick-hack, and it would be more proper to include this PR instead of #5382. Am I right?

  3. Is there anything pending that you have in mind as a follow up for this PR?

These adds some breaking-changes so having a rebase would be helpful to check the impact.

it "without time zone" do
time = Time.utc(1994, 11, 6, 8, 49, 37, nanosecond: 0)
HTTP.rfc1123_date(time).should eq("Sun, 06 Nov 1994 08:49:37 GMT")
HTTP.format_time(time).should eq("Sun, 6 Nov 1994 08:49:37 GMT")

This comment has been minimized.

@bcardiff

bcardiff Jun 5, 2018

Member

The preferred format is with two day digits as of https://tools.ietf.org/html/rfc7231#section-7.1.1.1 and https://tools.ietf.org/html/rfc2616#page-21

Somehow I would prefer rfc/iso in the function names for the format.

This comment has been minimized.

@straight-shoota

straight-shoota Jun 5, 2018

Contributor

The name was changed because the methods doesn't implement one exact standard format but generally format and parse time instances for use in a HTTP protocol context (see previous comments in #4729 (comment)).

I'll look into the preferred digit format.

end
# DEPRECATED: Use `HTTP.format_time` instead.
def self.rfc1123_time(time : Time) : String

This comment has been minimized.

@bcardiff

bcardiff Jun 5, 2018

Member

Why this is left and marked as deprecated and rfc1123_date dropped completely? I think is either both or none.

This comment has been minimized.

@straight-shoota

straight-shoota Jun 5, 2018

Contributor

Yeah, it needs to be removed.

@@ -124,7 +124,7 @@ end
struct Time
def to_json(json : JSON::Builder)
json.string(Time::Format::ISO_8601_DATE_TIME.format(self))
json.string(Time::Format::RFC_3339.format(self))

This comment has been minimized.

@bcardiff

bcardiff Jun 5, 2018

Member

Why this change if ISO_8601 is still defined?

This comment has been minimized.

@straight-shoota

straight-shoota Jun 5, 2018

Contributor

The difference is mostly negligible, but RFC 3339 is generally the better (stricter) format for this purpose. ISO_8601_DATE_TIME by default formats a offset as HHmm while RFC 3339 requires a colon: HH:mm

The JSON specification doesn't include a Date data type, so it's up to the implementation. JavaScript's Date.toJSON() actually is RFC 3339 compliant (YYYY-MM-DDTHH:mm:ss.sssZ).

This comment has been minimized.

@bcardiff

bcardiff Jun 6, 2018

Member

The references I found said ISO8601 is the format used in Javascript. I think we should stick to 8601 here. Actually been less stricter is better IMO since an API built on top of this would be less pedantic to the user.

This comment has been minimized.

@straight-shoota

straight-shoota Jun 6, 2018

Contributor

True, this is a bit confusing because RFC 3339 actually is ISO 8601. But RFC 3339 is better suited as machine-parsed exchange format. That's why I'm pretty convinced we should use RFC 3339 as output format for #to_json.

This comment has been minimized.

@straight-shoota

straight-shoota Jun 6, 2018

Contributor

I left RFC_3339 here but it returns the exact same result as ISO_8601_DATE_TIME.

@straight-shoota straight-shoota force-pushed the straight-shoota:jm-time-formats-simplified branch from 1e1b76d to 81cc01f Jun 5, 2018

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 5, 2018

Thanks for taking a look at this!

  1. Done.
  2. Exactly. #5382 was supposed to be a quick bug fix, but it's been stalled for so long it would probably be better to just forget it and merge this one directly.
  3. I don't think I had any further plans. Maybe I forgot... it's been a long time.
@@ -169,7 +169,7 @@ describe "JSON serialization" do
end
it "deserializes Time" do
Time.from_json(%("2016-11-16T09:55:48-0300")).to_utc.should eq(Time.utc(2016, 11, 16, 12, 55, 48))
Time.from_json(%("2016-11-16T09:55:48-03:00")).to_utc.should eq(Time.utc(2016, 11, 16, 12, 55, 48))

This comment has been minimized.

@RX14

RX14 Jun 6, 2018

Member

We shouldn't break this

This comment has been minimized.

@bcardiff

bcardiff Jun 6, 2018

Member

According to #5123 (comment)
I agree. The JSON format should be ISO8601 by default.

This comment has been minimized.

@straight-shoota

straight-shoota Jun 6, 2018

Contributor

Output should be the RFC 3339 flavour of ISO 8601, the parser should accept more variants. That's probably the best solution.

@bcardiff

This comment has been minimized.

Member

bcardiff commented Jun 6, 2018

@straight-shoota

  1. Any further arguments on #5123 (comment) ? Why emitting in RFC3339 is better since the default in JavaScript is ISO8601? Luckily in utc they both use Z suffix. Although the added : is still ISO8601 compatible, we should be fine. @RX14 do you agree?

  2. I noticed that now nanoseconds will be rendered in to_json while before only seconds were used. This was also aligned with go behaviour that includes a separata formatter for just seconds and nano seconds. Could you change the formatter to emit only seconds as before? I don't have a strong opinion to add a 3339Nano formatter but the extra precision might be problematic in some system where just milliseconds are supported.

After 2 is addressed is a 👍 from me (and a big 🙇 for all the time re work)

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 6, 2018

  1. There is no difference. ISO_8601_DATE_TIME would also use a colon in the offset, and that's how it should be. JavaScript's Date.toJSON() serializes in UTC only, by the way.
  2. Good catch! We need more specs for these things 😆 But: should the default format entirely omit sub-second precision or round to milliseconds? (that's what Date.toJSON() does). Though it probably won't be much of a difference, applications that understand milliseconds should probably be smart enough to digest a few extra digits.
    There is no specification for any of this anyway, so we might as well use the best prevision available. If that doesn't work for some use case, it's easy to employ a custom formatter with a different format.
@bcardiff

This comment has been minimized.

Member

bcardiff commented Jun 6, 2018

  1. Ok
  2. I would prefer the default to be the option that will broadly work. So seconds without fraction. If needed later a 3339Nano formatter can be added. If a spec to ensure sub-second are drop is added definitely will be a plus

If you prefer me to made the changes in 2 let me know.

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 6, 2018

Done. To accomplish this, I added a fraction_digits parameter to RFC_3339.format which I think would be generally useful and could be expanded. But that's probably something for a subsequent PR because this needs proper thought.

@straight-shoota straight-shoota force-pushed the straight-shoota:jm-time-formats-simplified branch from e569afc to cb2e4fa Jun 6, 2018

@RX14

I don't like that there are changes to time/format/formatter.cr and time/format/parser.cr, and I don't like that there are new methods just moneypatched in to the existing parser and formatter, instead of being inherited or stuff being put in a module.

# * **%3N**, **%L**: milliseconds, zero padded (000, 001, ..., 999)
# * **%6N**: microseconds, zero padded (000000, 000001, ..., 999999)
# * **%9N**: nanoseconds, zero padded (000000000, 000000001, ..., 999999999)
# * **%N**: second fraction, zero padded. (Same as `%9N` but may consume more than 9 digits while parsing)

This comment has been minimized.

@RX14

RX14 Jun 7, 2018

Member

Why this doc change?

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

It helps discoverability if the directives are lexically ordered. It's unrelated, though. If you like, I can make a separate PR for this.

@@ -23,6 +23,10 @@ struct Time::Format
io << time.year / 100
end
def full_or_short_year
year

This comment has been minimized.

@RX14

RX14 Jun 7, 2018

Member

Why an alias?

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

The parser has a different implementation.

@@ -79,6 +83,12 @@ struct Time::Format
io << get_short_day_name.upcase
end
def short_day_name_with_comma?

This comment has been minimized.

@RX14

RX14 Jun 7, 2018

Member

This is a duplicate definition.

This comment has been minimized.

@straight-shoota

straight-shoota Jun 7, 2018

Contributor

Where's the duplicate?

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 7, 2018

@RX14 You mean more like #5084? That design has been abandoned for good, I hope. This is the best I can think of and I don't see any problems with monkey patching a few specific methods nor any benefits in extracting them to modules.
We could put the specialized methods directly into the formatter.cr/parser.cr types to avoid monkey patching, but I think having these implementations in separate files is better code organization. All pattern methods defined in a custom files are only used within that file and namespaces with the format name.

@RX14

I still don't get why time/format/formatter.cr and time/format/parser.cr are changed. If these still have the same functionality, why do they suddenly have extra unused methods? If you're going tor refactor core time formatter stuff, it should at the very least be in a seperate commit.

# * **%m**: month number, zero padded (01, 02, ..., 12)
# * **%_m**: month number, blank padded (" 1", " 2", ..., "12")
# * **%-m**: month number (1, 2, ..., 12)
# * **%M**: minute, zero padded (00, 01, 02, ..., 59)
# * **%3N**, **%L**: milliseconds, zero padded (000, 001, ..., 999)

This comment has been minimized.

@RX14

RX14 Jun 8, 2018

Member

duplicate definition of L

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 8, 2018

They provide additional methods that are used by several specific formats. So there is added functionality for the core formatter/parser feature. Refactoring is actually in 0c242b0.

@RX14

RX14 approved these changes Jun 9, 2018

@RX14

This comment has been minimized.

Member

RX14 commented Jun 9, 2018

Needs a rebase though

@bcardiff bcardiff force-pushed the straight-shoota:jm-time-formats-simplified branch from cb2e4fa to fee9ec4 Jun 9, 2018

@bcardiff

This comment has been minimized.

Member

bcardiff commented Jun 9, 2018

@straight-shoota I rebase the changes already.

@bcardiff bcardiff merged commit 36647c8 into crystal-lang:master Jun 10, 2018

4 checks passed

ci/circleci: test_darwin Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@bcardiff bcardiff added this to the 0.25.0 milestone Jun 10, 2018

@straight-shoota straight-shoota deleted the straight-shoota:jm-time-formats-simplified branch Jun 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment