refactor(compiler): support interpolation and encoded entities when lexing markup #42062

petebacondarwin · 2021-05-12T13:16:00Z

The lexer now splits interpolation and encoded entity tokens out from the string it is tokenizing.

See individual commits...

jelbourn

LGTM

Reviewed-for: fw-playground

JoostK

I left a bunch of comments. I would be curious to know how this would behave in TGP run, as that would give us a better feeling for how breaky this might be and/or which cases this doesn't handle correctly.

There's two typos in "refactor(compiler): support encoded entity tokens when lexing markup":

The lexer now splits encoded entity tokens out from text and attribute value tokens.
Previously encoded entities would be decoded and ~~there~~ their decoded value would be
included as part of the text token of the surrounding text. Now the entities will have
their own tokens. There are two scenarios: text and attribute values.

The HTML parser has been modified to ~~recombined~~ recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.

JoostK · 2021-06-10T20:28:55Z

modules/playground/src/zippy_component/app/zippy.html

@@ -1,6 +1,6 @@
 <div class="zippy">
  <div (click)="toggle()" class="zippy__title">
-    {{ visible ? '&#x25BE;' : '&#x25B8;' }} {{title}}
+    {{ visible ? '\u25BE' : '\u25B8' }} {{title}}


I think this change also means this would technically be a breaking change?

One person's bug fix is another person's breaking change... but perhaps you are right 😞

No doubt there will be loads of these in G3...

Hmm, this was a refactoring commit. I wonder how I can get this BREAKING CHANGE into the change log...
Perhaps I should rename this commit to a fix? Where I am "fixing" the fact that interpolated blocks should not be decoding HTML entities??

@alxhub @JoostK - thoughts?

I have added a commit that ensures this breaking change does not happen.

Rather than make this change here, and revert it later, can we squash the fix commit (I'm guessing refactor(compiler): ensure that HTML entities in interpolations are decoded) into this one?

In general I'm a huge proponent of commits being independent - merging only half the commits from this PR shouldn't leave the repo in a broken state, for example.

packages/compiler/src/ml_parser/lexer.ts

packages/compiler/src/ml_parser/parser.ts

packages/compiler/src/ml_parser/html_whitespaces.ts

packages/compiler/src/ml_parser/parser.ts

packages/compiler/src/i18n/i18n_parser.ts

packages/compiler/src/ml_parser/lexer.ts

packages/compiler/src/ml_parser/ast.ts

This import is not used in the file, so can be removed. PR Close angular#42062

…ts (angular#42062) The compliance tests can check source-map segments against expectations encoded into the expectation files. Previously, the encoding of the expected segment was only delimited by whitespace, but this made it difficult to identify segments that started or ended with whitespace. Now these segment expectations are wrapped in double-quotes which makes it easier to read and understand the expectation files. PR Close angular#42062

This commit removes 9 cycles in the dependency graph of the compiler code. PR Close angular#42062

…ngular#42062) The lexer now splits interpolation tokens out from text tokens. Previously the contents of `<div>Hello, {{ name}}<div>` would be a single text token. Now it will be three tokens: ``` TEXT: "Hello, " INTERPOLATION: "{{", " name", "}}" TEXT: "" ``` - INTERPOLATION tokens have three parts, "start marker", "expression" and "end marker". - INTERPOLATION tokens are always preceded and followed by TEXT tokens, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close angular#42062

This function is general purpose and by moving it into the `chars.ts` file along with similar helpers, it can be reused in the lexer, for instance. PR Close angular#42062

…e values (angular#42062) The lexer now splits interpolation tokens out from attribute value tokens. Previously the attribute value of `<div attr="Hello, {{ name}}">` would be a single token. Now it will be three tokens: ``` ATTR_VALUE_TEXT: "Hello, " ATTR_VALUE_INTERPOLATION: "{{", " name", "}}" ATTR_VALUE_TEXT: "" ``` - ATTR_VALUE_INTERPOLATION tokens have three parts, "start marker", "expression" and "end marker". - ATTR_VALUE_INTERPOLATION tokens are always preceded and followed by TEXT tokens, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close angular#42062

…ngular#42062) The lexer now splits encoded entity tokens out from text and attribute value tokens. Previously encoded entities would be decoded and the decoded value would be included as part of the text token of the surrounding text. Now the entities have their own tokens. There are two scenarios: text and attribute values. Previously the contents of `<div>Hello & goodbye</div>` would be a single TEXT token. Now it will be three tokens: ``` TEXT: "Hello " ENCODED_ENTITY: "&", "&" TEXT: " goodbye" ``` Previously the attribute value in `<div title="Hello & goodbye">` would be a single text token. Now it will be three tokens: ``` ATTR_VALUE_TEXT: "Hello " ENCODED_ENTITY: "&", "&" ATTR_VALUE_TEXT: " goodbye" ``` - ENCODED_ENTITY tokens have two parts: "decoded" and "encoded". - ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close angular#42062

When it was tokenized, text content is split into parts that can include interpolations and encoded entities tokens. To make this information available to downstream processing, this commit adds these tokens to the `Text` AST nodes, with suitable processing. PR Close angular#42062

The tests were checking that the source-span of parsed HTML nodes were accurate, but they were not checking the span when it includes the "leading trivia", which are given by the `fullStart` rather than `start` location. PR Close angular#42062

…sages (angular#42062) Previously, the way templates were tokenized meant that we lost information about the location of interpolations if the template contained encoded HTML entities. This meant that the mapping back to the source interpolated strings could be offset incorrectly. Also, the source-span assigned to an i18n message did not include leading whitespace. This confused the output source-mappings so that the first text nodes of the message stopped at the first non-whitespace character. This commit makes use of the previous refactorings, where more fine grain information was provided in text tokens, to enable the parser to identify the location of the interpolations in the original source more accurately. Fixes angular#41034 PR Close angular#42062

…2062) These token interfaces will make it easier to reason about tokens in the parser and in specs. Previously, it was never clear what items could appear in the `parts` array of a token given a particular `TokenType`. Now, each token interface declares a labelled tuple for the parts, which helps to document the token better. PR Close angular#42062

… interpolations (angular#42062) Such interpolations turned up during internal testing at Google, so this commit adds a test to prevent regressions. PR Close angular#42062

…lar#42062) This is a simple tidy up commit to move to the more specific `===` comparison operator in the HTML lexer/parser. PR Close angular#42062

…butes (angular#42062) This tests a scenario that was failing in an internal project. PR Close angular#42062

…in attributes (angular#42062)" (angular#43033) This reverts commit fe12651. PR Close angular#43033

…er (angular#42062)" (angular#43033) This reverts commit 28b0c45. PR Close angular#43033

…rminated interpolations (angular#42062)" (angular#43033) This reverts commit 11ebe21. PR Close angular#43033

…ngular#42062)" (angular#43033) This reverts commit 9b3d4f5. PR Close angular#43033

…i18n messages (angular#42062)" (angular#43033) This reverts commit f08516d. PR Close angular#43033

angular#43033) This reverts commit 973f9b8. PR Close angular#43033

…#42062)" (angular#43033) This reverts commit 8a54896. PR Close angular#43033

… markup (angular#42062)" (angular#43033) This reverts commit 942b24d. PR Close angular#43033

…attribute values (angular#42062)" (angular#43033) This reverts commit c516e25. PR Close angular#43033

…#42062)" (angular#43033) This reverts commit 3d3b69f. PR Close angular#43033

…markup (angular#42062)" (angular#43033) This reverts commit c8a46bf. PR Close angular#43033

…" (angular#43033) This reverts commit 7585519. PR Close angular#43033

…ance tests (angular#42062)" (angular#43033) This reverts commit 29f9888. PR Close angular#43033

…r#43033) This reverts commit 42265cc. PR Close angular#43033

angular-automatic-lock-bot · 2021-10-09T16:05:51Z

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

_{This action has been performed automatically by a bot.}

google-cla bot added the cla: yes label May 12, 2021

petebacondarwin force-pushed the compiler-lex-interpolations branch 6 times, most recently from 049effb to c54d3dd Compare May 13, 2021 17:29

atscott added the area: compiler Issues related to `ngc`, Angular's template compiler label May 13, 2021

ngbot bot added this to the Backlog milestone May 13, 2021

petebacondarwin force-pushed the compiler-lex-interpolations branch 2 times, most recently from 6f94cb4 to ef6198e Compare May 15, 2021 21:06

petebacondarwin changed the title ~~refactor(compiler): support interpolation tokens when lexing markup~~ refactor(compiler): support interpolation and encoded entities when lexing markup May 15, 2021

petebacondarwin force-pushed the compiler-lex-interpolations branch 2 times, most recently from 67eab89 to fd33222 Compare May 19, 2021 09:58

petebacondarwin force-pushed the compiler-lex-interpolations branch 3 times, most recently from aa344d9 to bd5a087 Compare June 9, 2021 15:11

petebacondarwin added action: review The PR is still awaiting reviews from at least one requested reviewer target: patch This PR is targeted for the next patch release labels Jun 9, 2021

petebacondarwin marked this pull request as ready for review June 9, 2021 15:11

pullapprove bot requested review from jelbourn and JoostK June 9, 2021 15:11

petebacondarwin requested review from alxhub and removed request for jelbourn June 9, 2021 20:06

pullapprove bot requested a review from jelbourn June 9, 2021 20:06

jelbourn approved these changes Jun 9, 2021

View reviewed changes

JoostK reviewed Jun 10, 2021

View reviewed changes

petebacondarwin force-pushed the compiler-lex-interpolations branch from bd5a087 to 7bca035 Compare June 11, 2021 10:14

alxhub reviewed Jun 11, 2021

View reviewed changes

packages/compiler/src/ml_parser/lexer.ts Outdated Show resolved Hide resolved

packages/compiler/src/ml_parser/ast.ts Outdated Show resolved Hide resolved

petebacondarwin force-pushed the compiler-lex-interpolations branch from 7bca035 to 451e3bf Compare June 12, 2021 18:48

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

refactor(ngcc): remove unused import (angular#42062)

45a0509

This import is not used in the file, so can be removed. PR Close angular#42062

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

refactor(compiler): remove cyclic dependencies (angular#42062)

1d697b7

This commit removes 9 cycles in the dependency graph of the compiler code. PR Close angular#42062

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

test(compiler): add a test for parsing multiline expressions in attri…

4c8191a

…butes (angular#42062) This tests a scenario that was failing in an internal project. PR Close angular#42062

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "test(compiler): add a test for parsing multiline expressions …

ce704eb

…in attributes (angular#42062)" (angular#43033) This reverts commit fe12651. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(compiler): use === rather than == in the ml_pars…

b8983b7

…er (angular#42062)" (angular#43033) This reverts commit 28b0c45. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "test(compiler): check that the parser supports prematurely te…

e1a8dc7

…rminated interpolations (angular#42062)" (angular#43033) This reverts commit 11ebe21. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(compiler): define interfaces for each lexer token (a…

bfa55bf

…ngular#42062)" (angular#43033) This reverts commit 9b3d4f5. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "fix(compiler): include leading whitespace in source-spans of …

97e99fa

…i18n messages (angular#42062)" (angular#43033) This reverts commit f08516d. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "test(compiler): check fullStart source-span (angular#42062)" (

8c4677f

angular#43033) This reverts commit 973f9b8. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(compiler): expose token parts in Text nodes (angular…

ccd2264

…#42062)" (angular#43033) This reverts commit 8a54896. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(compiler): support encoded entity tokens when lexing…

c11b1c4

… markup (angular#42062)" (angular#43033) This reverts commit 942b24d. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(compiler): support interpolation tokens when lexing …

bfa87a6

…attribute values (angular#42062)" (angular#43033) This reverts commit c516e25. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(compiler): share isQuote() via chars.ts (angular…

dee68de

…#42062)" (angular#43033) This reverts commit 3d3b69f. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(compiler): support interpolation tokens when lexing …

8ad02f6

…markup (angular#42062)" (angular#43033) This reverts commit c8a46bf. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(compiler): remove cyclic dependencies (angular#42062)…

6ef596f

…" (angular#43033) This reverts commit 7585519. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "test(compiler-cli): clarify source-map expectations in compli…

073a04a

…ance tests (angular#42062)" (angular#43033) This reverts commit 29f9888. PR Close angular#43033

TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021

Revert "refactor(ngcc): remove unused import (angular#42062)" (angula…

55946e6

…r#43033) This reverts commit 42265cc. PR Close angular#43033

angular-automatic-lock-bot bot locked and limited conversation to collaborators Oct 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(compiler): support interpolation and encoded entities when lexing markup #42062

refactor(compiler): support interpolation and encoded entities when lexing markup #42062

petebacondarwin commented May 12, 2021 •

edited

Loading

jelbourn left a comment

JoostK left a comment

JoostK Jun 10, 2021

petebacondarwin Jun 11, 2021

petebacondarwin Jun 14, 2021

petebacondarwin Jun 14, 2021

petebacondarwin Jun 15, 2021

alxhub Jun 28, 2021

angular-automatic-lock-bot bot commented Oct 9, 2021

refactor(compiler): support interpolation and encoded entities when lexing markup #42062

refactor(compiler): support interpolation and encoded entities when lexing markup #42062

Conversation

petebacondarwin commented May 12, 2021 • edited Loading

jelbourn left a comment

Choose a reason for hiding this comment

JoostK left a comment

Choose a reason for hiding this comment

JoostK Jun 10, 2021

Choose a reason for hiding this comment

petebacondarwin Jun 11, 2021

Choose a reason for hiding this comment

petebacondarwin Jun 14, 2021

Choose a reason for hiding this comment

petebacondarwin Jun 14, 2021

Choose a reason for hiding this comment

petebacondarwin Jun 15, 2021

Choose a reason for hiding this comment

alxhub Jun 28, 2021

Choose a reason for hiding this comment

angular-automatic-lock-bot bot commented Oct 9, 2021

petebacondarwin commented May 12, 2021 •

edited

Loading