-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the negative pattern of textNumberPattern to resolve padding ambiguities #1139
Use the negative pattern of textNumberPattern to resolve padding ambiguities #1139
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like this proposed approach better than the previously discussed ideas. I especially like that people can keep using their same textNumberPattern property and only need to add a character (was it *?) to the negative half of the pattern to solve their problem with the way ICU interprets the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This needs release note.
This also needs documentation as a Daffodil extension to DFDL.
This should be presented to the DFDL workgroup as a needed fix to the DFDL spec. I will start a discussion thread there about this.
@@ -586,6 +597,29 @@ case class ConvertTextStandardNumberPrim(e: ElementBase) | |||
Nope | |||
} | |||
|
|||
// ICU does not have a way to set the pad position to after an affix if the positive pattern | |||
// does not have that affix. For example, "* 0", will always have a pad position of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs release note. This is a backward incompatible change (very minor one) because a "*0" in the negative pattern would have been always ignored before, but now means something that changes the behavior.
The warnings suggested in DAFFODIL-2870 are still meaningful, they just need to allow for the cases where the pad specifier in the negative pattern is used rather than ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we capture and record a phrase or sentence which needs to be added to the release notes so we don't forget to include it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention we've been using so far is to add the string Deprecation/Compatibility:
(or something similar) to the end of the commit message, followed by the message that should be added to release notes. Here's the commit message for this PR where the update can be seen:
…guities ICU only uses the positive pattern of the textNumberPattern property to determine pad character and position, completely ignoring the negative pattern. If the positive pattern has no affix associated with the pad character, then there is an ambiguity if the pad character should appear before or after the negative affix when unparsing negative numbers. In this case, ICU defaults to before the affix, with no way to change it via the pattern. This effectively means it is not possible for ICU number padding to appear after a negative affix if there is no positive affix. To resolve this ambiguity and allow configuring where pad characters appear, we inspect the negative pattern. If both negative and positive patterns define padding on the same affix, and the positive pattern has an empty string for that affix, then we use the pad position from the negative pattern. In all other cases, the pad character in the negative pattern is ignored following usual ICU behavior. For example, a textNumberPattern of "*0####0;-*00" formats a negative number with zero padding after the hyphen, whereas normal ICU behavior would ignore the negative pattern and zero pad before the hyphen. Deprecation/Compatibility: The pad character in the negative part of textNumberPattern is no longer ignored if the positive part of textNumberPattern defines a pad character without an associated affix (e.g. "*0###0;-*00"). In these cases, the position of the pad charcter in the negative part is used to define whether padding occurs before or after the negative affix. All other cases follow existing rules of textNumberPattern (i.e. the pad character in the negative part is ignored). DAFFODIL-2871
296236c
to
87f3b89
Compare
With no objections, I've removed "draft" from this, created an issue in Jira, and updated the commit message to reference the issue and mention the backwards incompatibility change. |
I accidentally deleted my forked branch, which closed the PR. I've restored the branch without and changes and reopened the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -586,6 +597,29 @@ case class ConvertTextStandardNumberPrim(e: ElementBase) | |||
Nope | |||
} | |||
|
|||
// ICU does not have a way to set the pad position to after an affix if the positive pattern | |||
// does not have that affix. For example, "* 0", will always have a pad position of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we capture and record a phrase or sentence which needs to be added to the release notes so we don't forget to include it later?
Note
This is an alternative approach to #1138, which does feel like a bit of a hack and does not support other padding styles. Like that one, I've marked this as an RFC/Draft since it uses behavior not specified by the DFDL specification and gives meaning to something ICU normally ignores.
In the previous PR, there was some discussion about making the negative pattern mean something more. Although DAFFODIL-2870 suggests we shouldn't do that, I think this use is small enough that it is potentially acceptable. It also doesn't require changes to Daffodil parsing/unparsing (with possible performance implications) or extension properties mentioned in the other PR (e.g.
dfdlx:textNumberPositivePatttern
/dfdlx:textNumberNegativePattern
/dfdlx:icuPadPosition
). We definitely do need to be careful about what, if any, meaning we extract from normally ignored parts of the negative pattern, but the pad position seems like a reasonable exception to me considering possible ambiguities with the positive pattern.ICU only uses the positive pattern of the textNumberPattern property to determine pad character and position, completely ignoring the negative pattern. If the positive pattern has no affix associated with the pad character, then there is an ambiguity if the pad character should appear before or after the negative affix when unparsing negative numbers. In this case, ICU defaults to before the affix, with no way to change it via the pattern. This effectively means it is not possible for ICU number padding to appear after a negative affix if there is no positive affix.
To resolve this ambiguity and allow configuring where pad characters appear, we inspect the negative pattern. If both negative and positive patterns define padding on the same affix, and the positive pattern has an empty string for that affix, then we use the pad position from the negative pattern. In all other cases, the pad character in the negative pattern is ignored following usual ICU behavior.
For example, a textNumberPattern of "*0####0;-*00" formats a negative number with zero padding after the hyphen, whereas normal ICU behavior would ignore the negative pattern and zero pad before the hyphen.