The formatting pattern used in JSR-310 is mostly the same as SimpleDateFormat and CLDR, but deviates where it makes sense to do so.
A decision needs to be taken on the letters and approach used. Key points are:
Formatting letters and support are needed for yearOfEra and Era and a format letter is needed for a combined show the calendar's preferred year format. DateTimeFormatterBuilder does not mention Era.
Update Javadoc for era, prepare for year/yea-of-era
Era was just not in the Javadoc.
"y" is currently year but needs to be changed to year-of-era (unfortunately). Here is the Javadoc definition
* y year-of-era year 2004; 04
* u year (proleptic) number 2004; -201
Making this change requires changing all the ISO definitions from yyyy to uuuu.
SimpleDateFormat virtually implements CLDR formatting patterns. I think it makes sense to continue this practice and avoid deviation. 310 should be fully compatible with other technologies that support CLDR. I tend to doubt if a deviation can be justified, because it would have quite unfavorable consequences.
As far as "y" and "u" go, what Stephen indicated above look completely consistent with CLDR.
A complete listing of CLDR format pattern definitions:
Can you identify any other CLDR technologies? The CLDR pattern letters have got more random over time due to backwards compatibility. Some of them don't make a lot of sense now. Where possible, 310 follows them, but where an alternative makes more sense (no backwards compatibility issues) 310 differs. See https://github.com/ThreeTen/threeten/blob/master/src/main/java/javax/time/format/DateTimeFormatterBuilder.java#L1096 and https://github.com/ThreeTen/threeten/blob/master/src/main/java/javax/time/format/DateTimeFormatterBuilder.java#L787
Bear in mind that CLDR data is unknown expect to those implementing I18N technologies. The rest of the world looks at the definition in the date formatting class.
Slide 10 "Who uses CLDR?":
I would agree that 310 is not expected to support legacy nonsense in CLDR patterns, but conformance to CLDR should not be compromised. It creates quite positive impact.
For example, let's say an application uses Java in the midtier and in the UI the Dojo toolkit which is also CLDR compliant. Then if 310 accepted CLDR pattern letters, the application can use the CLDR syntax when specifying the formatting preference to both APIs, Java and Dojo, as well as for describing the desired localized formats in the application code or in the identity store as part of the user preferences. If 310 had deviations, the application developers would have to resolve them somehow at some engineering cost or simply find it infeasible to resolve. Good conformance to CLDR is a key to superior interoperability for i18n.
This should not be like 310 defining the patterns afresh with some CLDR-compatible elements. This should be selecting the symbols to be supported from the CLDR definitions, with ideally no deviation at all, or a very few, if something does make sense. Because CLDR is the standard locale data repository, supporting technologies' defining their own pattern syntax defeats the purpose in a sense that the proprietary protocol is only good in 310.
Stephen, may I ask why 310 needs to deviate for MONTH_OF_QUARTER? Why just having fractional seconds are insufficient?
MONTH_OF_QUARTER should be removed, it isn't going to be used enough to be in there.
Fractional seconds is a broader problem, one that the JDK gets wrong last time I looked. Parse "12.23" using the format "ss.SSS" and you'll see the problem. JSR-310's solution is broader because it allows the decimal point to be localized as well. However, I would prefer to have the JSR-310 format as ssfsss rather than ssfSSS as it is today.
I note that 310, like the JDK, has no support for the concept of standalone output types.
Support for standalone and formatted output types is being added to java.util.Calendar in SE 8.
Remove month-of-quarter (q) from pattern letters
Adopt CLDR for pretty much everything
Altered code to drop MONTH_OF_QUARTER "q", and to adopt most of CLDR. In my opinion, using a prefix for standalone fields is a better user experience, but the difference isn't enough to justify it.
It is important to note that the JDK defines "u" differently to CLDR. To be effective, I've defined "E" and "EE" as being equivalent to JDK "u" and "uu", leaving "EEE" for the day-of-week abbreviation. Pattern letters "e" are used for the localized day-of-week number (not implemented).
Currently, JDK8 only has support for standalone months, not quarters or day-of-week. If Oracle wants full support of standalone, it will have to ensure that the data is available.
This section outlines the updated comments and TODOs:
Thank you for the updates Stephen. It's great to see the improved interoperability.
To address the conflict of "u", I am thinking 310 should suggest either updating the pattern string or translating it to "e", the CLDR equivalent, as it enters a 310 applicaiton. The 310 API is to take "u" as extended/proleptic year per CLDR defintion.
I thought about possible options to deviate from CLDR, for instance, for a single letter "u" by treating as if it were CLDR "e" (the day number of the week 1 through 7) only when "u" appeared in conjunction with "w" or "W", but I concluded in myself that none of them seems as good as just following CLDR.
310 should leave "E" and "EE" alone because otherwise 310 would return a number when a user is expecting the short day per CLDR definition.
CLDR look like approving "VV" for the time-zone ID amd "x" and "X" based patterns for offsets. We should adjust to match CLDR exactly for these.
More differences between Formatter Pattern letters and CLDR need to be resolved:
Who will finalize the pattern letter spec for threeten?
'G' was changed to be compatible with CLDR.
'y' is still year, not year of era.
I think the major difference between threeten and CLDR is the standalone style support. Currently TextStyle defines only the sizes. For example, 'c' designates standalone names which are used in fi locale in CLDR.
Thanks for the correction on 'G'.
Of those listed above,
The prevailing view at Oracle is to support pattern letters as defined by CLDR or to create non-overlapping extensions. In a few cases, we should ask CLDR to clarify or extend the CLDR definitions to support functions proposed for Threeten as has been done for "n", "N", etc.
For the list above:
The TextStyle enum will need to define symbols for the standalone variants of FULL, SHORT, NARROW.
The description should mention that if a specific Standalone text is not available it will revert to the non-standalone text. That description can be in TextStyle or in the Formatter/FormatterBuilder.
Whilst I suspect standalone is orthogonal to TextStyle, its easier to manage with three additional constants. Fallback is more a property of the builder.
I think fallback should be a property of the formatter rather than the builder.
Added TextStyle enum values for standalone in http://hg.openjdk.java.net/threeten/threeten/jdk/rev/e64803b886e3
Added standalone text support: http://hg.openjdk.java.net/threeten/threeten/jdk/rev/02b563dff230
(added pattern letters 'L', 'c', and 'q'. changed 'e' to support text.)
Per Roger's request, I will be changing "E" and "EE" to be compatible with CLDR.
More LDML/CLDR compatibility support: http://hg.openjdk.java.net/threeten/threeten/jdk/rev/483ee6c117d4
I don't think we should be supporting 6 letter 'E'/'e'/'c'. These refer to a fourth test style width, only applicable to day-of-week. While the code maps 6 letters to our current SHORT style, this isn't really giving the user anything they don't have with 3 letters. Unless the plan is to provide the additional TextStyle and CLDR data, then it seems unecessary to support 6 letters at all at this point.
Other letters needing fixing are Y, F and #.
Document builder method calls consistently - http://hg.openjdk.java.net/threeten/threeten/jdk/rev/3f93b60b87bd
Correctly reserve # - http://hg.openjdk.java.net/threeten/threeten/jdk/rev/a19e0b4b7093
Fix bugs in F and Y letters (Y still not perfect) - http://hg.openjdk.java.net/threeten/threeten/jdk/rev/5280a0b575ba
I am still working on a full alignment of Y and w with CLDR to mean WeekBasedYear.
With respect to the 6 letter E/e/c. We determined that for pattern letters that are supported by java.time the behavior should match CLDR. So even if they duplicate other pattern combinations they should be supported.
The point I'm making is that 6 letter Eec is wrong, because it is using the wrong data. CLDR has a fourth TextStyle, but Java has no data for that, nor a TextStyle for it. Until we have the data, there is no point in having the 6 letter patterns.
CLDR data in JDK 8 is 21.0.1. which doesn't have the "short" style of dayWidth. ("narrow" <= "short" <= "abbreviated" in width)
I added 6-letter Eec just for compatibility with mapping to TextStyle.SHORT. I thought it'd be better than rejecting valid CLDR patterns. It might be better to define the 4th text style with TextStyle and still map 6-letter to SHORT because we don't have data. If we want to strictly follow DTD 21, I will remove the 6-letter Eec "support".
Either removing 6 letters, or adding a new TextStyle are acceptable to me, whereas the current state is not.
Assuming that 310 requires new pattern letters for time zone formatting in the latest LDML spec, I'd suggest adding a new text style.
LDML uses "wide", "abbreviated", "short", and "narrow", while JDK traditionally uses LONG and SHORT with adding NARROW. The TextStyle constants may be aligned with the LDML width names.
The LDML alignment sounds favorable.
The consensus at Oracle is to drop the 6 letter versions for Eec and see if requirement develops for them by the next release.
The alignment with CLDR names is a separate topic, TextStyle already has a SHORT which we use for the CLDR Abbr. It might be constructive to change SHORT to MEDIUM or ABBREVIATED unless we can come up with another term to introduce later for the CLDR SHORT.
I'm happy with the decision to remove 6 letters.
I think that we should rename the constants. FULL should be changed to LONG to match the Calendar public constants. SHORT is a problem, as changing the name clashes with Calendar. But I think my favourite option is changing SHORT to MEDIUM (mostly because ABBREVIATED is too long a word and because it leaves space for SHORT in the future).
Fixed 'Y' and 'w' to match CLDR with http://hg.openjdk.java.net/threeten/threeten/jdk/rev/f5c2a5c95092
LDML does not use SHORT in the same context as wide, abbreviated, and narrow. The DTD only uses SHORT in the context date, time and date/time format lengths and in zone/metazone.
DateTimeFormatterBuilder.appendLocalizedOffset refers to FormatStyle in the method javadoc but the argument to the method is TextStyle; instead of blurring the purpose of TextStyle, and more in keeping with LDML, the styles for ZoneId and ZoneOffset should perhaps use FormatStyle. It already has SHORT and it seems more consistent with LDML.
Renaming TextStyle.FULL to LONG is a good idea and matches LDML and the JDK. TextStyle.SHORT can stay as is (matching the java.text) since TextStyle.SHORT is not the same as FormatStyle.SHORT.
There is also the question of whether LDML will change in the future.
LDML says: "Month, day, and quarter names may vary along two axes: the width and the context. The context is either format (the default), the form used within a date format string (such as "Saturday, November 12th", or stand-alone, the form used independently, such as in calendar headers. The width can be wide (the default), abbreviated, or narrow; for days only, the width can also be short, which is ideally between the abbreviated and narrow widths, but must be no longer than abbreviated and no shorter than narrow (if short day names are not explicitly specified, abbreviated day names are used instead). "
So there are eight values "wide", "abbreviated", "short" and "narrow" plus standalone variants, where "short" is only used for day-of-week.
I don't think we're going to match the LDML names here, so we should choose what is best for Java. There, the question is what the difference really is between FormatStyle and TextStyle. My view has always been that format style is for the full descriptive forms, as in date style and time style. Everything else was text style. If we don't follow that, then where are the boundaries and why?
Agreed to leave TextStyle and FormatStyle as is. Just the removal of the 6 letter patterns to go.
6 letters removed in http://hg.openjdk.java.net/threeten/threeten/jdk/rev/28c446dcb53c