Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Decide on formatting pattern letters #64

Closed
jodastephen opened this Issue · 39 comments

4 participants

Stephen Colebourne Roger Riggs Dan Chiba masa310
Stephen Colebourne
Owner

The formatting pattern used in JSR-310 is mostly the same as SimpleDateFormat and CLDR, but deviates where it makes sense to do so.
https://github.com/ThreeTen/threeten/blob/master/src/main/java/javax/time/format/DateTimeFormatterBuilder.java#L1074

A decision needs to be taken on the letters and approach used. Key points are:

  • fractions (310 is better)
  • standalone characters (310 has a sensible approach, not implemented)
  • time zone names (CLDR has lots here)
Roger Riggs
Collaborator

Formatting letters and support are needed for yearOfEra and Era and a format letter is needed for a combined show the calendar's preferred year format. DateTimeFormatterBuilder does not mention Era.

Stephen Colebourne
Owner

Era was just not in the Javadoc.

"y" is currently year but needs to be changed to year-of-era (unfortunately). Here is the Javadoc definition

     *   y       year-of-era                 year              2004; 04
     *   u       year (proleptic)            number            2004; -201

Making this change requires changing all the ISO definitions from yyyy to uuuu.

Dan Chiba

SimpleDateFormat virtually implements CLDR formatting patterns. I think it makes sense to continue this practice and avoid deviation. 310 should be fully compatible with other technologies that support CLDR. I tend to doubt if a deviation can be justified, because it would have quite unfavorable consequences.

As far as "y" and "u" go, what Stephen indicated above look completely consistent with CLDR.

A complete listing of CLDR format pattern definitions:
http://unicode.org/reports/tr35/#Date_Field_Symbol_Table

Stephen Colebourne
Owner

Can you identify any other CLDR technologies? The CLDR pattern letters have got more random over time due to backwards compatibility. Some of them don't make a lot of sense now. Where possible, 310 follows them, but where an alternative makes more sense (no backwards compatibility issues) 310 differs. See https://github.com/ThreeTen/threeten/blob/master/src/main/java/javax/time/format/DateTimeFormatterBuilder.java#L1096 and https://github.com/ThreeTen/threeten/blob/master/src/main/java/javax/time/format/DateTimeFormatterBuilder.java#L787

Bear in mind that CLDR data is unknown expect to those implementing I18N technologies. The rest of the world looks at the definition in the date formatting class.

Dan Chiba

Slide 10 "Who uses CLDR?":
http://www.w3.org/International/multilingualweb/madrid/slides/davis.pdf

I would agree that 310 is not expected to support legacy nonsense in CLDR patterns, but conformance to CLDR should not be compromised. It creates quite positive impact.

For example, let's say an application uses Java in the midtier and in the UI the Dojo toolkit which is also CLDR compliant. Then if 310 accepted CLDR pattern letters, the application can use the CLDR syntax when specifying the formatting preference to both APIs, Java and Dojo, as well as for describing the desired localized formats in the application code or in the identity store as part of the user preferences. If 310 had deviations, the application developers would have to resolve them somehow at some engineering cost or simply find it infeasible to resolve. Good conformance to CLDR is a key to superior interoperability for i18n.

This should not be like 310 defining the patterns afresh with some CLDR-compatible elements. This should be selecting the symbols to be supported from the CLDR definitions, with ideally no deviation at all, or a very few, if something does make sense. Because CLDR is the standard locale data repository, supporting technologies' defining their own pattern syntax defeats the purpose in a sense that the proprietary protocol is only good in 310.

Stephen, may I ask why 310 needs to deviate for MONTH_OF_QUARTER? Why just having fractional seconds are insufficient?

Stephen Colebourne
Owner

MONTH_OF_QUARTER should be removed, it isn't going to be used enough to be in there.

Fractional seconds is a broader problem, one that the JDK gets wrong last time I looked. Parse "12.23" using the format "ss.SSS" and you'll see the problem. JSR-310's solution is broader because it allows the decimal point to be localized as well. However, I would prefer to have the JSR-310 format as ssfsss rather than ssfSSS as it is today.

I note that 310, like the JDK, has no support for the concept of standalone output types.

Roger Riggs
Collaborator

Support for standalone and formatted output types is being added to java.util.Calendar in SE 8.

Stephen Colebourne
Owner

Altered code to drop MONTH_OF_QUARTER "q", and to adopt most of CLDR. In my opinion, using a prefix for standalone fields is a better user experience, but the difference isn't enough to justify it.

It is important to note that the JDK defines "u" differently to CLDR. To be effective, I've defined "E" and "EE" as being equivalent to JDK "u" and "uu", leaving "EEE" for the day-of-week abbreviation. Pattern letters "e" are used for the localized day-of-week number (not implemented).

Currently, JDK8 only has support for standalone months, not quarters or day-of-week. If Oracle wants full support of standalone, it will have to ensure that the data is available.

This section outlines the updated comments and TODOs:
https://github.com/ThreeTen/threeten/blob/c1dda217142f495371829e859bdf075622d548e9/src/main/java/javax/time/format/DateTimeFormatterBuilder.java#L1094

Dan Chiba

Thank you for the updates Stephen. It's great to see the improved interoperability.

To address the conflict of "u", I am thinking 310 should suggest either updating the pattern string or translating it to "e", the CLDR equivalent, as it enters a 310 applicaiton. The 310 API is to take "u" as extended/proleptic year per CLDR defintion.

I thought about possible options to deviate from CLDR, for instance, for a single letter "u" by treating as if it were CLDR "e" (the day number of the week 1 through 7) only when "u" appeared in conjunction with "w" or "W", but I concluded in myself that none of them seems as good as just following CLDR.

310 should leave "E" and "EE" alone because otherwise 310 would return a number when a user is expecting the short day per CLDR definition.

Stephen Colebourne
Owner

CLDR look like approving "VV" for the time-zone ID amd "x" and "X" based patterns for offsets. We should adjust to match CLDR exactly for these.

Roger Riggs
Collaborator

More differences between Formatter Pattern letters and CLDR need to be resolved:

Format Letter CLDR Threeten Formatter Supported by j.u.Calendar
"G" 1..2 formats as abbreviation same as CLDR (*fixed) same as CLDR
"Y" week based year same as CLDR same as CLDR
"L" Standalone month same as CLDR same as CLDR
"g" Modified Julian Day number -- not supported -- not supported
"E" 1..2 formats as day of week short name 1..2 formats as 1 or 2 digit numeric
similar to CLDR 'c' and 'e'
same as CLDR
"e" local day of week number from locale base same as CLDR not supported
"c" 1: numeric, 3: abbr, 4: long, 5: narrow local day of week same as CLDR not supported
"a" AM or PM 1..3 format as short, 4 as Full, 5 as Narrow same as CLDR
"Z" 1..3: RFC 822, 4: local GMT format; 5: XML format 1: +HHMM, 2: +HH:MM, : RFC822 GMT,
Threeten does not support 4 and 5
same as CLDR
"Q" Quarter names 1..2: format as numeric not supported
"q" Standalone Quarter names 1..2: format as numeric not supported
"u" extended year extended year (*fixed) legacy defined as ISO day of week number
masa310

Who will finalize the pattern letter spec for threeten?

masa310

'G' was changed to be compatible with CLDR.
http://hg.openjdk.java.net/threeten/threeten/jdk/rev/789f1fc222b5

'y' is still year, not year of era.

I think the major difference between threeten and CLDR is the standalone style support. Currently TextStyle defines only the sizes. For example, 'c' designates standalone names which are used in fi locale in CLDR.

Roger Riggs
Collaborator

Thanks for the correction on 'G'.

Stephen Colebourne
Owner

Of those listed above,

  • G - fixed
  • Y - requires design/implementation around localized WOY vs ISO WOY (our data model differs from LDML)
  • L - could be supported assuming JDK has the data, although would require a new TextStyle
  • g - can be easily supported
  • E/e/c - I think what we have now is simpler and better than the alternatives (E = ISO, e = local), however "e" should be extended to cover text as per LDML
  • a - 310 extends LDML in a sensible way
  • Z - matches LDML/SimpleDateFormat for 1..3, 4 and 5 not implemented
  • Q - requires quarter name data (see separate issue)
  • u - implemented
  • j - not a real pattern letter
Roger Riggs
Collaborator

The prevailing view at Oracle is to support pattern letters as defined by CLDR or to create non-overlapping extensions. In a few cases, we should ask CLDR to clarify or extend the CLDR definitions to support functions proposed for Threeten as has been done for "n", "N", etc.
For the list above:

  • L - Add support for standalone month names
  • g - omit for now
  • E - revert to the CLDR semantics for E, EE but ask/suggest to CLDR how the non-localized weekday number should be formatted.
  • e - implement day of week
  • c - implement standalone day of week
  • a - ask CLDR if they can extend to be consistent with other pattern letters, for short, narrow, full, etc.
  • Z - implement cases 4,5
  • Y - implement with the related 'W' differences ... the data model will need design work.
Roger Riggs
Collaborator

The TextStyle enum will need to define symbols for the standalone variants of FULL, SHORT, NARROW.
The description should mention that if a specific Standalone text is not available it will revert to the non-standalone text. That description can be in TextStyle or in the Formatter/FormatterBuilder.

Stephen Colebourne
Owner

Whilst I suspect standalone is orthogonal to TextStyle, its easier to manage with three additional constants. Fallback is more a property of the builder.

masa310

I think fallback should be a property of the formatter rather than the builder.

Roger Riggs
Collaborator

Added TextStyle enum values for standalone in http://hg.openjdk.java.net/threeten/threeten/jdk/rev/e64803b886e3

masa310

Added standalone text support: http://hg.openjdk.java.net/threeten/threeten/jdk/rev/02b563dff230
(added pattern letters 'L', 'c', and 'q'. changed 'e' to support text.)

masa310

Per Roger's request, I will be changing "E" and "EE" to be compatible with CLDR.

masa310

More LDML/CLDR compatibility support: http://hg.openjdk.java.net/threeten/threeten/jdk/rev/483ee6c117d4

  • "E" and "EE" are the same thing as "EEE". (Number is no longer supported with 'E'.)
  • "cc" is no longer valid. Only "c" is valid as Number.
  • 6-letter 'E', 'e', and 'c' are accepted and treated as short style.
  • Only one letter 'F' is accepted.
  • Up to 2 letters of 'd', 'H', 'h','K', 'k', 'm', and 's' are accepted.
  • Up to 3 letters of 'D' are accepted.
Stephen Colebourne
Owner

I don't think we should be supporting 6 letter 'E'/'e'/'c'. These refer to a fourth test style width, only applicable to day-of-week. While the code maps 6 letters to our current SHORT style, this isn't really giving the user anything they don't have with 3 letters. Unless the plan is to provide the additional TextStyle and CLDR data, then it seems unecessary to support 6 letters at all at this point.

Other letters needing fixing are Y, F and #.

Roger Riggs
Collaborator

I am still working on a full alignment of Y and w with CLDR to mean WeekBasedYear.

Roger Riggs
Collaborator

With respect to the 6 letter E/e/c. We determined that for pattern letters that are supported by java.time the behavior should match CLDR. So even if they duplicate other pattern combinations they should be supported.

Stephen Colebourne
Owner

The point I'm making is that 6 letter Eec is wrong, because it is using the wrong data. CLDR has a fourth TextStyle, but Java has no data for that, nor a TextStyle for it. Until we have the data, there is no point in having the 6 letter patterns.

masa310

CLDR data in JDK 8 is 21.0.1. which doesn't have the "short" style of dayWidth. ("narrow" <= "short" <= "abbreviated" in width)

I added 6-letter Eec just for compatibility with mapping to TextStyle.SHORT. I thought it'd be better than rejecting valid CLDR patterns. It might be better to define the 4th text style with TextStyle and still map 6-letter to SHORT because we don't have data. If we want to strictly follow DTD 21, I will remove the 6-letter Eec "support".

Stephen Colebourne
Owner

Either removing 6 letters, or adding a new TextStyle are acceptable to me, whereas the current state is not.

masa310

Assuming that 310 requires new pattern letters for time zone formatting in the latest LDML spec, I'd suggest adding a new text style.

LDML uses "wide", "abbreviated", "short", and "narrow", while JDK traditionally uses LONG and SHORT with adding NARROW. The TextStyle constants may be aligned with the LDML width names.

Dan Chiba

The LDML alignment sounds favorable.

Roger Riggs
Collaborator

The consensus at Oracle is to drop the 6 letter versions for Eec and see if requirement develops for them by the next release.

The alignment with CLDR names is a separate topic, TextStyle already has a SHORT which we use for the CLDR Abbr. It might be constructive to change SHORT to MEDIUM or ABBREVIATED unless we can come up with another term to introduce later for the CLDR SHORT.

Stephen Colebourne
Owner

I'm happy with the decision to remove 6 letters.

I think that we should rename the constants. FULL should be changed to LONG to match the Calendar public constants. SHORT is a problem, as changing the name clashes with Calendar. But I think my favourite option is changing SHORT to MEDIUM (mostly because ABBREVIATED is too long a word and because it leaves space for SHORT in the future).

Roger Riggs
Collaborator
Roger Riggs
Collaborator

LDML does not use SHORT in the same context as wide, abbreviated, and narrow. The DTD only uses SHORT in the context date, time and date/time format lengths and in zone/metazone.

DateTimeFormatterBuilder.appendLocalizedOffset refers to FormatStyle in the method javadoc but the argument to the method is TextStyle; instead of blurring the purpose of TextStyle, and more in keeping with LDML, the styles for ZoneId and ZoneOffset should perhaps use FormatStyle. It already has SHORT and it seems more consistent with LDML.
Renaming TextStyle.FULL to LONG is a good idea and matches LDML and the JDK. TextStyle.SHORT can stay as is (matching the java.text) since TextStyle.SHORT is not the same as FormatStyle.SHORT.

Stephen Colebourne
Owner

There is also the question of whether LDML will change in the future.

LDML says: "Month, day, and quarter names may vary along two axes: the width and the context. The context is either format (the default), the form used within a date format string (such as "Saturday, November 12th", or stand-alone, the form used independently, such as in calendar headers. The width can be wide (the default), abbreviated, or narrow; for days only, the width can also be short, which is ideally between the abbreviated and narrow widths, but must be no longer than abbreviated and no shorter than narrow (if short day names are not explicitly specified, abbreviated day names are used instead). "
So there are eight values "wide", "abbreviated", "short" and "narrow" plus standalone variants, where "short" is only used for day-of-week.

I don't think we're going to match the LDML names here, so we should choose what is best for Java. There, the question is what the difference really is between FormatStyle and TextStyle. My view has always been that format style is for the full descriptive forms, as in date style and time style. Everything else was text style. If we don't follow that, then where are the boundaries and why?

Stephen Colebourne
Owner

Agreed to leave TextStyle and FormatStyle as is. Just the removal of the 6 letter patterns to go.

Stephen Colebourne jodastephen closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.