Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flexible ISO 8601 support to Utf8Parser/Formatter (and beyond?) #28942

Open
layomia opened this issue Mar 12, 2019 · 7 comments
Open

Add flexible ISO 8601 support to Utf8Parser/Formatter (and beyond?) #28942

layomia opened this issue Mar 12, 2019 · 7 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory
Milestone

Comments

@layomia
Copy link
Contributor

layomia commented Mar 12, 2019

This proposal stems from a discussion (dotnet/coreclr#22999) regarding adding a new format specifier “I” to the UTF8 DateTime(Offset) parser (and subsequently formatter) for ISO 8601 strings. The new format was to be used by the Utf8JsonReader/Writer.
Given feedback, it is clear that adding a new format specifier requires further investigation and design.
The workaround for the Utf8JsonReader/Writer was to implement the parsing and formatting logic internally.

We are looking to add performant UTF-8 DateTime(Offset) read/write support (https://github.com/dotnet/corefx/issues/34690, https://github.com/dotnet/corefx/issues/34576) to the Utf8JsonReader/Writer for the following ISO 8601 profile:

YYYY-MM-DD[Thh:mm[:ss[.s]][TZD]]

where:

YYYY = four-digit year
MM = two-digit month (01=January, etc.)
DD = two-digit day of month (01 through 31)
T = ‘T’ or ‘ ‘
hh = two digits of hour (00 through 23) (am/pm NOT allowed)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm or +hhmm or –hhmm or +hh or -hh)

Rationale and Usage

ISO 8601 is an unambiguous and prolific date and time standard, important for interoping with different systems/languages.

As part of 3.0 we are adding the new Utf8JsonReader/Writer/Document types. We have implemented (Try)GetDateTime(Offset) and WriteString(Value) methods on the Json Reader/Writer types (dotnet/corefx#35903, dotnet/corefx#35966) to read and write ISO 8601 DateTime(Offset) strings. The parsing/formatting logic is currently internal to the types:

Utf8JsonReader
https://github.com/dotnet/corefx/blob/ef1b1835a8459cee50d895b1e2040bb2336eeeda/src/System.Text.Json/src/System/Text/Json/Reader/JsonReaderHelper.Date.cs#L43-L357

Utf8JsonWriter
https://github.com/dotnet/corefx/blob/52f3ad9f5f1276833fe8f53c79b5a995c8865df9/src/System.Text.Json/src/System/Text/Json/Writer/JsonWriterHelper.Date.cs#L22-L126

The proposal for adding ISO 8601 support is to add a new custom format specifier to the Utf8Parser/Formatter for the (Try)GetDateTime(Offset) methods to depend on (in place of the internal logic) with the following specifications. This will be particularly beneficial if there is a need for flexible ISO 8601 processing beyond Json.

API (No changes)

API methods for DateTime reading and writing have already been approved.

public ref partial struct Utf8JsonReader   
{ 
    public DateTime GetDateTime(); 
    public DateTimeOffset GetDateTimeOffset(); 
    public bool TryGetDateTime(out DateTime value); 
    public bool TryGetDateTimeOffset(out DateTimeOffset value); 
}   

public ref partial struct Utf8JsonWriter   
{   
    public void WriteString(System.ReadOnlySpan<byte> utf8PropertyName, System.DateTime value, bool escape = true) { } 
    public void WriteString(System.ReadOnlySpan<byte> utf8PropertyName, System.DateTimeOffset value, bool escape = true) { } 
    public void WriteString(System.ReadOnlySpan<char> propertyName, System.DateTime value, bool escape = true) { } 
    public void WriteString(System.ReadOnlySpan<char> propertyName, System.DateTimeOffset value, bool escape = true) { } 
    public void WriteStringValue(System.DateTime value) { } 
    public void WriteStringValue(System.DateTimeOffset value) { }  
}   

public readonly partial struct JsonElement   
{ 
    public DateTime GetDateTime(); 
    public DateTimeOffset GetDateTimeOffset(); 
    public bool TryGetDateTime(out DateTime value); 
    public bool TryGetDateTimeOffset(out DateTimeOffset value);   
}   

These methods are could use the already existing Utf8Parser/Formatter types for parsing and formatting DateTime data to replace the internally implemented parse/format logic:

public static partial class Utf8Parser  
{ 
    public static bool TryParse(System.ReadOnlySpan<byte> source, out System.DateTime value, out int bytesConsumed, char standardFormat = '\0') { throw null; } 
    public static bool TryParse(System.ReadOnlySpan<byte> source, out System.DateTimeOffset value, out int bytesConsumed, char standardFormat = '\0') { throw null; }  
}  

public static partial class Utf8Formatter  
{ 
    public static bool TryFormat(System.DateTime value, System.Span<byte> destination, out int bytesWritten, System.Buffers.StandardFormat format = default(System.Buffers.StandardFormat)) { throw null; } 
    public static bool TryFormat(System.DateTimeOffset value, System.Span<byte> destination, out int bytesWritten, System.Buffers.StandardFormat format = default(System.Buffers.StandardFormat)) { throw null; }  
}  

Details

Of note are the format and standardFormat parameters of the TryParse and TryFormat methods of the Utf8Parser/Formatter. The current options for the format are “G”/standard, “O”, and “R”. Of these, only “O” implements a profile of the ISO 8601 standard, the strict Round-trip format specifier: yyyy'-'MM'-'dd'T'HH':'mm':'ss'.'fffffffK. Most ISO 8601 adopters do not use this profile. Thus, it is beneficial to implement a new format which is more permissive and also very performant. We can call it “I”.

Open Questions

  • Outside of Json, do we have a need for UTF-8 ISO support?
  • To what extent do we wish to support ISO 8601? Is the proposed profile sufficient? There are some variants which support using commas instead of periods etc.
  • If we are only supporting ISO 8601 for Json, should it remain as an implementation internal to Utf8JsonReader/Writer rather than rolling up to Utf8Parser/Formatter?
  • If ISO 8601 should be implemented on the Utf8Parser/Formatter types, should it be as a new API (e.g. TryParseAsISO/TryFormatToISO ) rather than a new format specifier (the format/standardFormat arguments)?
  • If we're adding ISO 8601 support to Utf8Parser/Formatter, do we also need/want to roll up support to DateTime{Offset}.{Try}Parse/Format/ToString?

Pull Requests

@stephentoub
Copy link
Member

If we're adding ISO 8601 support to Utf8Parser/Formatter, do we also need/want to roll up support to DateTime{Offset}.{Try}Parse/Format/ToString?

If this format is as common as is cited, then I'd say yes. If there's value in adding support to the UTF8-based implementation, there's similar value in adding support to the UTF16-based implementation.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@neuecc
Copy link

neuecc commented Apr 9, 2020

I've created new logger that only supports directly write to UTF8 to support zero allocation formatting.
https://github.com/Cysharp/ZLogger
Internally, Utf8Formatter.TryFormat(DateTime, StandardFormat) is called.

The DateTime format is important when creating a human-readable log.
And the default format is hard to read,
I want to change like "yyyy-MM-dd HH:mm:ss", but it's not supported.

For example, I'd be happy to write something like this

// Additing Log's prefix
logging.AddZLoggerConsole(options =>
{
    // Action<IBufferWriter<byte>, LogInfo>?
    options.PrefixFormatter = (writer, info) => ZString.Utf8Format(writer, "[{0}][{1:yyyy-MM-dd HH:mm:ss}]", info.LogLevel, info.Timestamp.DateTime.ToLocalTime());
});

// {1:yyyy-MM-dd HH:mm:ss} will translate to
// Utf8Formatter.TryFormat(info.Timestamp, dest, out var written, StandardFormat.Parse("yyyy-MM-dd HH:mm:ss"));

// If you can set this option, you will get the following message
// [Debug][2020-04-09 07:56:12]foo
logger.ZLogDebug("foo");

@Clockwork-Muse
Copy link
Contributor

ss = two digits of second (00 through 59)

.... officially 8601 allows for leap seconds, so the inclusive range should 00-60. This is mostly fine as DateTime(Offset) will accept (valid) leap seconds (although it converts them to the 59th second).

@ericwj
Copy link

ericwj commented Nov 9, 2020

Sorry for misusing this issue about an I format specifier to argue that O is broken and should be fixed, but it appears to me to be the right place for being highly related. For use with JSON, I think ISO8601 is usually overkill. Rather, the O specifier should be enough to work with formats conforming to RFC3339.

So, although running the risk of duplicating issues that you fixed in JsonSerializer and will adopt to fix with I, can I ask for O

  • at least to mean the same with Utf8Parser as it does with DateTime/Offset - i.e. to use the format
    yyyy'-'MM'-'dd'T'HH':'mm':'ss'.'FFFFFFFK
    instead of
    yyyy'-'MM'-'dd'T'HH':'mm':'ss'.'fffffffK
    or better yet to accept any number of fractional digits, including none?
  • with Utf8Formatter to honor any StandardFormat.Precision - even those larger than 7? Idk up to 98 is fine. The parameter is ignored completely right now.
  • with Utf8Parser to properly set the DateTimeKind? At least parsing a date ending in Z produces Unspecified, which parsing with O should never produce unless the timezone is absent completely. Even DateTime.TryParseExact('O') produces Unspecified given a string containing a timezone and parsing with RoundTripKind? That is a bug too, or I don't get it.

And can I ask for DateTime/Offset.ToString() not to throw and produce the requested number of digits when given a custom format string with any number of f or F characters in it?

Thumbs up for extracting what JsonSerializer does into generally useful API, since even just wanting to serialize UTC but convert to local after deserialization is a reason to write a JsonConverter after which the JsonSerializer behavior cannot be reproduced. Another good reason is that only reading this issue explains to me the surprise after surprise I ran into after having setup JsonSerializerOptions with a converter added and not looking back much after that.

@Clockwork-Muse
Copy link
Contributor

even those larger than 7? Idk up to 98 is fine.

In particular, Java (and NodaTime, and probably some databases) supports storing and passing nanosecond precision, so 9 digits would help with interoperability. (In terms of a maximum limit, the 19 maximum digits available were NTP to shift to a 64-bit fractional second should likely be enough, although even that is overkill)

However, note that actually/accurately getting a timestamp that precise requires special hardware - no consumer/server hardware is going to do it - and the best that can be done is 7 digits (... in most cases - some hardware or older software stacks may still be limited to less than 3). Doing this may also play havoc with round tripping (because ingestion could only accept 7 digits, and so would return truncated values).

@ericwj
Copy link

ericwj commented Nov 9, 2020

Sure, TimeSpan.FromSeconds(1).Ticks == 10_000_000 and new DateTimeOffset(10_000_000, default) == "01/01/0001 00:00:01 +00:00" exactly.

I was not talking about getting the precision. That requires not using DateTime or DateTimeOffset in the first place indeed - and thus also using some different way of parsing and formatting, different from the Utf8Parser/Formatter and DateTime/Offset API.

I'm just talking about accepting and producing the required format that can be expressed through the existing API, truncating or rounding to 7 digits when parsing and producing zero padding on formatting.

@ovska
Copy link

ovska commented Mar 2, 2022

Is there yet a consensus on whether this should be implemented, as the issue is still open but with a suggestion-label? For what it's worth, I think it's an important missing piece in Utf8Parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory
Projects
None yet
Development

No branches or pull requests

8 participants