Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the negative pattern of textNumberPattern to resolve padding ambiguities #1139

Conversation

stevedlawrence
Copy link
Member

Note

This is an alternative approach to #1138, which does feel like a bit of a hack and does not support other padding styles. Like that one, I've marked this as an RFC/Draft since it uses behavior not specified by the DFDL specification and gives meaning to something ICU normally ignores.

In the previous PR, there was some discussion about making the negative pattern mean something more. Although DAFFODIL-2870 suggests we shouldn't do that, I think this use is small enough that it is potentially acceptable. It also doesn't require changes to Daffodil parsing/unparsing (with possible performance implications) or extension properties mentioned in the other PR (e.g. dfdlx:textNumberPositivePatttern/dfdlx:textNumberNegativePattern/dfdlx:icuPadPosition). We definitely do need to be careful about what, if any, meaning we extract from normally ignored parts of the negative pattern, but the pad position seems like a reasonable exception to me considering possible ambiguities with the positive pattern.

ICU only uses the positive pattern of the textNumberPattern property to determine pad character and position, completely ignoring the negative pattern. If the positive pattern has no affix associated with the pad character, then there is an ambiguity if the pad character should appear before or after the negative affix when unparsing negative numbers. In this case, ICU defaults to before the affix, with no way to change it via the pattern. This effectively means it is not possible for ICU number padding to appear after a negative affix if there is no positive affix.

To resolve this ambiguity and allow configuring where pad characters appear, we inspect the negative pattern. If both negative and positive patterns define padding on the same affix, and the positive pattern has an empty string for that affix, then we use the pad position from the negative pattern. In all other cases, the pad character in the negative pattern is ignored following usual ICU behavior.

For example, a textNumberPattern of "*0####0;-*00" formats a negative number with zero padding after the hyphen, whereas normal ICU behavior would ignore the negative pattern and zero pad before the hyphen.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like this proposed approach better than the previously discussed ideas. I especially like that people can keep using their same textNumberPattern property and only need to add a character (was it *?) to the negative half of the pattern to solve their problem with the way ICU interprets the pattern.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

This needs release note.

This also needs documentation as a Daffodil extension to DFDL.

This should be presented to the DFDL workgroup as a needed fix to the DFDL spec. I will start a discussion thread there about this.

@@ -586,6 +597,29 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
Nope
}

// ICU does not have a way to set the pad position to after an affix if the positive pattern
// does not have that affix. For example, "* 0", will always have a pad position of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs release note. This is a backward incompatible change (very minor one) because a "*0" in the negative pattern would have been always ignored before, but now means something that changes the behavior.

The warnings suggested in DAFFODIL-2870 are still meaningful, they just need to allow for the cases where the pad specifier in the negative pattern is used rather than ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we capture and record a phrase or sentence which needs to be added to the release notes so we don't forget to include it later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention we've been using so far is to add the string Deprecation/Compatibility: (or something similar) to the end of the commit message, followed by the message that should be added to release notes. Here's the commit message for this PR where the update can be seen:

87f3b89

…guities

ICU only uses the positive pattern of the textNumberPattern property to
determine pad character and position, completely ignoring the negative
pattern. If the positive pattern has no affix associated with the pad
character, then there is an ambiguity if the pad character should appear
before or after the negative affix when unparsing negative numbers. In
this case, ICU defaults to before the affix, with no way to change it
via the pattern. This effectively means it is not possible for ICU
number padding to appear after a negative affix if there is no positive
affix.

To resolve this ambiguity and allow configuring where pad characters
appear, we inspect the negative pattern. If both negative and positive
patterns define padding on the same affix, and the positive pattern has
an empty string for that affix, then we use the pad position from the
negative pattern. In all other cases, the pad character in the negative
pattern is ignored following usual ICU behavior.

For example, a textNumberPattern of "*0####0;-*00" formats a negative
number with zero padding after the hyphen, whereas normal ICU behavior
would ignore the negative pattern and zero pad before the hyphen.

Deprecation/Compatibility:

The pad character in the negative part of textNumberPattern is no longer
ignored if the positive part of textNumberPattern defines a pad
character without an associated affix (e.g. "*0###0;-*00"). In these
cases, the position of the pad charcter in the negative part is used to
define whether padding occurs before or after the negative affix. All
other cases follow existing rules of textNumberPattern (i.e. the pad
character in the negative part is ignored).

DAFFODIL-2871
@stevedlawrence stevedlawrence force-pushed the daffodil-DRAFT-ambiguous-prefix-position branch from 296236c to 87f3b89 Compare January 11, 2024 18:09
@stevedlawrence
Copy link
Member Author

With no objections, I've removed "draft" from this, created an issue in Jira, and updated the commit message to reference the issue and mention the backwards incompatibility change.

@stevedlawrence stevedlawrence deleted the daffodil-DRAFT-ambiguous-prefix-position branch January 16, 2024 15:26
@stevedlawrence stevedlawrence restored the daffodil-DRAFT-ambiguous-prefix-position branch January 16, 2024 15:27
@stevedlawrence stevedlawrence deleted the daffodil-DRAFT-ambiguous-prefix-position branch January 16, 2024 15:28
@stevedlawrence stevedlawrence restored the daffodil-DRAFT-ambiguous-prefix-position branch January 16, 2024 15:28
@stevedlawrence
Copy link
Member Author

I accidentally deleted my forked branch, which closed the PR. I've restored the branch without and changes and reopened the PR.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -586,6 +597,29 @@ case class ConvertTextStandardNumberPrim(e: ElementBase)
Nope
}

// ICU does not have a way to set the pad position to after an affix if the positive pattern
// does not have that affix. For example, "* 0", will always have a pad position of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we capture and record a phrase or sentence which needs to be added to the release notes so we don't forget to include it later?

@stevedlawrence stevedlawrence merged commit cdb4121 into apache:main Jan 16, 2024
20 checks passed
@stevedlawrence stevedlawrence deleted the daffodil-DRAFT-ambiguous-prefix-position branch January 16, 2024 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants