-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(compiler): add i18nPreserveWhitespaceForLegacyExtraction
#56507
base: main
Are you sure you want to change the base?
Conversation
bc7d69d
to
1b3c1c4
Compare
231111a
to
ac378b8
Compare
bde716c
to
9a03487
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgp1130 looks great, thanks for the updates 👍 I've left a few minor comments, but none of them are blocking.
As a followup (in a separate PR), it might be great to put together a markdown file (e.g. packages/compiler/src/i18n/EXTRACTION.md
) and describe how the extraction process depends on whitespaces in messages and how different subsystems are configured to deal with it (e.g. ViewEngine-based extraction, code generation with goog.getMsg, etc). You have this information already in different comments in the code (which is very helpful!) and a document may help provide an overall terminology (e.g. what a "significant whitespace" means), how this works from high-level and we can also use the doc as a reference in comments to the code that you've added. This document will help us in the future to restore the context once we decide to perform further changes to this code.
Thank you.
9a03487
to
f6cdba8
Compare
43c24c7
to
e7835bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed-for: public-api
889e7a7
to
2e11426
Compare
55af290
to
c7e163c
Compare
7ae978f
to
4742312
Compare
a157152
to
0d3d143
Compare
a5fdc4a
to
e43049a
Compare
This configures whether or not to preserve whitespace content when extracting messages from Angular templates in the legacy (View Engine) extraction pipeline. The change to `i18n_parser.ts` is really a bug fix. The localization process requires that the nubmer of tokens in a message stays constant between i18n passes so it can reuse source spans. Therefore even empty text tokens need to be retained in order to maintain that invariant. However this bug fix does affect message IDs, and changing that could invalidate existing translations, therefore it is also gated by the same flag to maintain backwards compatibility.
…pace` and `preserveWhitespaces` are used together correctly This bans the case where `preserveSignificantWhitespace` is disabled but `preserveWhitespace` is enabled. Such a case would imply retaining all whitespace but also dropping significant whitespace which is not possible.
…Visitor` When disabling `i18nPreserveSignificantWhitespaceForLegacyExtraction` I was looking at a test case with ICU messages containing leading and trailing whitespace: ```angular <div i18n> {apples, plural, =other {I have many apples.}} </div> ``` This would historically generate two messages: ```javascript const MSG_TMP = goog.getMsg('{apples, plural, =other {I have many apples.}}'); const MSG_FOO = goog.getMsg(' {$ICU} ', { 'ICU': MSG_TMP }); ``` But I found that I was getting just one message: ```javascript const MSG_TMP = goog.getMsg(' {apples, plural, =other {I have many apples.}} '); ``` This is arguably an improvement, but changed the messages and message IDs, which isn't desirable with this option. I eventually traced this back to the `isIcu` initialization in [`i18n_parser.ts`](/packages/compiler/src/i18n/i18n_parser.ts): ```typescript const context: I18nMessageVisitorContext = { isIcu: nodes.length == 1 && nodes[0] instanceof html.Expansion, // ... }; ``` [`_I18nVisitor.prototype.visitExpansion`](/packages/compiler/src/i18n/i18n_parser.ts) uses this to decide whether or not to generate a sub-message for a given ICU expansion: ```typescript if (context.isIcu || context.icuDepth > 0) { // Returns an ICU node when: // - the message (vs a part of the message) is an ICU message, or // - the ICU message is nested. const expPh = context.placeholderRegistry.getUniquePlaceholder(`VAR_${icu.type}`); i18nIcu.expressionPlaceholder = expPh; context.placeholderToContent[expPh] = { text: icu.switchValue, sourceSpan: icu.switchValueSourceSpan, }; return context.visitNodeFn(icu, i18nIcu); } // Else returns a placeholder // ICU placeholders should not be replaced with their original content but with the their // translations. // TODO(vicb): add a html.Node -> i18n.Message cache to avoid having to re-create the msg const phName = context.placeholderRegistry.getPlaceholderName('ICU', icu.sourceSpan.toString()); context.placeholderToMessage[phName] = this.toI18nMessage([icu], '', '', '', undefined); const node = new i18n.IcuPlaceholder(i18nIcu, phName, icu.sourceSpan); return context.visitNodeFn(icu, node); ``` Note that `isIcu` is the key condition between these two cases and depends on whether or not the ICU expansion has any siblings. The introduction of `WhitespaceVisitor` to `I18nMetaVisitor` trims insignificant whitespace, including empty text nodes not adjacent to an ICU expansion (from [`WhitespaceVisitor.prototype.visitText`](/packages/compiler/src/ml_parser/html_whitespaces.ts)): ```typescript const isNotBlank = text.value.match(NO_WS_REGEXP); const hasExpansionSibling = context && (context.prev instanceof html.Expansion || context.next instanceof html.Expansion); if (isNotBlank || hasExpansionSibling) { // Transform node by trimming it... return trimmedNode; } return null; // Drop node which is empty and has no ICU expansion sibling. ``` `hasExpansionSibling` was intended to retain empty text nodes leading or trailing an ICU expansion, however `context` was `undefined`, so this check failed and the leading / trailing text nodes were dropped. This resulted in trimming the ICU text by dropping the leading / trailing whitespace nodes. Having only a single ICU expansion with no leading / trailing text nodes caused `_I18nVisitor` to initialize `isIcu` incorrectly and caused it to generate one message instead of two. `WhitespaceVisitor` is supposed to get this context from `visitAllWithSiblings`. So the fix here is to make sure `WhitespaceVisitor` is always visited via this function which provides the required context. I updated all usage sites to make sure this context is use consistently and implemented the `WhitespaceVisitor.prototype.visit` method to throw when the context is missing to make sure we don't encounter a similar mistake in the future. Unfortunately this broke one compliance test. Specifically the [`icu_logic/icu_only.js`](/home/douglasparker/Source/ng/packages/compiler-cli/test/compliance/test_cases/r3_view_compiler_i18n/icu_logic/icu_only.js) test which changed from generating: ```javascript function MyComponent_Template(rf, ctx) { if (rf & 1) { i0.ɵɵi18n(0, 0); } // ... } ``` To now generating: ```javascript function MyComponent_Template(rf, ctx) { if (rf & 1) { i0.ɵɵtext(0, " "); i0.ɵɵi18n(1, 0); i0.ɵɵtext(2, "\n"); } // ... } ``` This test uses the default value `preserveWhitespaces: false` (`i18nPreserveSignificantWhitespaceForLegacyExtraction` should not affect compiled JS output, we already retain significant whitespace there). So what this indicates to me is that ICU logic is already broken because it's not preserving significant whitespace in this case. My change is probably a bug fix, but one which would affect the compiled runtime, which is not in scope here. The root cause is because using `visitAllWithSiblings` everywhere means the context is retained correctly in this case and the whitespace is leading/trailing an ICU message, therefore it is retained per the logic of `WhitespaceVisitor.prototype.visitText` I mentioned eariler. To address this, I left one usage of `WhitespaceVisitor` using `html.visitAll` instead of `visitAllWithSiblings` to retain this bug. I has to lossen the assertion I put in `WhitespaceVisitor.prototype.visit` to make this possible, but it should still throw by default when misused, which is the important part.
e43049a
to
25a345f
Compare
This configures whether or not to preserve whitespace content when extracting messages from Angular templates in the legacy (View Engine) extraction pipeline. It defaults to
true
to maintain backwards compatibility, howeverfalse
is the ideal behavior since it makes translations less sensitive to unimportant whitespace changes.The change to
i18n_parser.ts
is really a bug fix. The localization process requires that the number of tokens in a message stays constant between i18n passes so it can reuse source spans. Therefore even empty text tokens need to be retained in order to maintain that invariant. However this bug fix does affect message IDs, and changing that could invalidate existing translations, therefore it is also gated by the same flag to maintain backwards compatibility.There are two other issues I discovered while adding this flag. First, Angular template expressions are factored in to the message ID calculation. Meaning changing from
Hello, {{ name }}
toHello, {{ firstName }}
will trigger retranslatation. That almost sounds correct, except that all expressions are replaced with something like{INTERPOLATION}
unless// ph(name = "...")
is used, so changing the expression would have no visible effect to translators anyways. Therefore, disablingi18nPreserveWhitespaceForLegacyExtraction
removes placeholders from the calculation.Second, ICU templates are broken up into multiple messages based on whitespace. So
<div i18n>{...}</div>
and<div i18n> {...} </div>
generate one and two$localize
messages apiece. Changing this would require updating the generated runtime code, which is out of scope for the time being here. Also reformatting ICUs in certain situations like{apples, plural, =other {Many apples.}}
->{apples, plural, =other {Many\napples.}}
still invalidates message IDs. It really shouldn't, but again, this is out of scope for the time being. Instead I'm just leaving ICUs as a known limitation of this flag with only basic support.