Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(compiler): support interpolation and encoded entities when lexing markup #42062

Conversation

petebacondarwin
Copy link
Member

@petebacondarwin petebacondarwin commented May 12, 2021

The lexer now splits interpolation and encoded entity tokens out from the string it is tokenizing.

See individual commits...

Fixes #41034

@google-cla google-cla bot added the cla: yes label May 12, 2021
@petebacondarwin petebacondarwin force-pushed the compiler-lex-interpolations branch 6 times, most recently from 049effb to c54d3dd Compare May 13, 2021 17:29
@atscott atscott added the area: compiler Issues related to `ngc`, Angular's template compiler label May 13, 2021
@ngbot ngbot bot added this to the Backlog milestone May 13, 2021
@petebacondarwin petebacondarwin force-pushed the compiler-lex-interpolations branch 2 times, most recently from 6f94cb4 to ef6198e Compare May 15, 2021 21:06
@petebacondarwin petebacondarwin changed the title refactor(compiler): support interpolation tokens when lexing markup refactor(compiler): support interpolation and encoded entities when lexing markup May 15, 2021
@petebacondarwin petebacondarwin force-pushed the compiler-lex-interpolations branch 2 times, most recently from 67eab89 to fd33222 Compare May 19, 2021 09:58
@petebacondarwin petebacondarwin force-pushed the compiler-lex-interpolations branch 3 times, most recently from aa344d9 to bd5a087 Compare June 9, 2021 15:11
@petebacondarwin petebacondarwin added action: review The PR is still awaiting reviews from at least one requested reviewer target: patch This PR is targeted for the next patch release labels Jun 9, 2021
@petebacondarwin petebacondarwin marked this pull request as ready for review June 9, 2021 15:11
@pullapprove pullapprove bot requested review from jelbourn and JoostK June 9, 2021 15:11
@petebacondarwin petebacondarwin requested review from alxhub and removed request for jelbourn June 9, 2021 20:06
@pullapprove pullapprove bot requested a review from jelbourn June 9, 2021 20:06
Copy link
Member

@jelbourn jelbourn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewed-for: fw-playground

Copy link
Member

@JoostK JoostK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of comments. I would be curious to know how this would behave in TGP run, as that would give us a better feeling for how breaky this might be and/or which cases this doesn't handle correctly.


There's two typos in "refactor(compiler): support encoded entity tokens when lexing markup":

The lexer now splits encoded entity tokens out from text and attribute value tokens.
Previously encoded entities would be decoded and there their decoded value would be
included as part of the text token of the surrounding text. Now the entities will have
their own tokens. There are two scenarios: text and attribute values.

The HTML parser has been modified to recombined recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.

@@ -1,6 +1,6 @@
<div class="zippy">
<div (click)="toggle()" class="zippy__title">
{{ visible ? '&#x25BE;' : '&#x25B8;' }} {{title}}
{{ visible ? '\u25BE' : '\u25B8' }} {{title}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change also means this would technically be a breaking change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One person's bug fix is another person's breaking change... but perhaps you are right 😞

No doubt there will be loads of these in G3...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this was a refactoring commit. I wonder how I can get this BREAKING CHANGE into the change log...
Perhaps I should rename this commit to a fix? Where I am "fixing" the fact that interpolated blocks should not be decoding HTML entities??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alxhub @JoostK - thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a commit that ensures this breaking change does not happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than make this change here, and revert it later, can we squash the fix commit (I'm guessing refactor(compiler): ensure that HTML entities in interpolations are decoded) into this one?

In general I'm a huge proponent of commits being independent - merging only half the commits from this PR shouldn't leave the repo in a broken state, for example.

packages/compiler/src/ml_parser/lexer.ts Outdated Show resolved Hide resolved
packages/compiler/src/ml_parser/lexer.ts Outdated Show resolved Hide resolved
packages/compiler/src/ml_parser/lexer.ts Show resolved Hide resolved
packages/compiler/src/ml_parser/parser.ts Outdated Show resolved Hide resolved
packages/compiler/src/ml_parser/html_whitespaces.ts Outdated Show resolved Hide resolved
packages/compiler/src/ml_parser/html_whitespaces.ts Outdated Show resolved Hide resolved
packages/compiler/src/ml_parser/parser.ts Outdated Show resolved Hide resolved
packages/compiler/src/i18n/i18n_parser.ts Outdated Show resolved Hide resolved
packages/compiler/src/i18n/i18n_parser.ts Show resolved Hide resolved
packages/compiler/src/ml_parser/lexer.ts Outdated Show resolved Hide resolved
packages/compiler/src/ml_parser/ast.ts Outdated Show resolved Hide resolved
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
This import is not used in the file, so can be removed.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
…ts (angular#42062)

The compliance tests can check source-map segments against expectations
encoded into the expectation files. Previously, the encoding of the expected
segment was only delimited by whitespace, but this made it difficult to identify
segments that started or ended with whitespace.

Now these segment expectations are wrapped in double-quotes which makes
it easier to read and understand the expectation files.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
This commit removes 9 cycles in the dependency graph of the compiler code.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
…ngular#42062)

The lexer now splits interpolation tokens out from text tokens.

Previously the contents of `<div>Hello, {{ name}}<div>` would be a single
text token. Now it will be three tokens:

```
TEXT: "Hello, "
INTERPOLATION: "{{", " name", "}}"
TEXT: ""
```

- INTERPOLATION tokens have three parts, "start marker", "expression"
  and "end marker".
- INTERPOLATION tokens are always preceded and followed by TEXT tokens,
  even if they represent an empty string.

The HTML parser has been modified to recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
This function is general purpose and by moving it into the
`chars.ts` file along with similar helpers, it can be reused
in the lexer, for instance.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
…e values (angular#42062)

The lexer now splits interpolation tokens out from attribute value tokens.
Previously the attribute value of `<div attr="Hello, {{ name}}">` would be a single
token. Now it will be three tokens:

```
ATTR_VALUE_TEXT: "Hello, "
ATTR_VALUE_INTERPOLATION: "{{", " name", "}}"
ATTR_VALUE_TEXT: ""
```

- ATTR_VALUE_INTERPOLATION tokens have three parts, "start marker",
  "expression" and "end marker".
- ATTR_VALUE_INTERPOLATION tokens are always preceded and followed
  by TEXT tokens, even if they represent an empty string.

The HTML parser has been modified to recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
…ngular#42062)

The lexer now splits encoded entity tokens out from text and attribute value tokens.

Previously encoded entities would be decoded and the decoded value would be
included as part of the text token of the surrounding text. Now the entities
have their own tokens. There are two scenarios: text and attribute values.

Previously the contents of `<div>Hello &amp; goodbye</div>` would be a single
TEXT token. Now it will be three tokens:

```
TEXT: "Hello "
ENCODED_ENTITY: "&", "&amp;"
TEXT: " goodbye"
```

Previously the attribute value in `<div title="Hello &amp; goodbye">` would be
a single text token. Now it will be three tokens:

```
ATTR_VALUE_TEXT: "Hello "
ENCODED_ENTITY: "&", "&amp;"
ATTR_VALUE_TEXT: " goodbye"
```

- ENCODED_ENTITY tokens have two parts: "decoded" and "encoded".
- ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens
  or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent
  an empty string.

The HTML parser has been modified to recombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
When it was tokenized, text content is split into parts that can include
interpolations and encoded entities tokens.

To make this information available to downstream processing, this commit
adds these tokens to the `Text` AST nodes, with suitable processing.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
The tests were checking that the source-span of parsed HTML nodes were
accurate, but they were not checking the span when it includes the
"leading trivia", which are given by the `fullStart` rather than `start`
location.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
…sages (angular#42062)

Previously, the way templates were tokenized meant that we lost information
about the location of interpolations if the template contained encoded HTML
entities. This meant that the mapping back to the source interpolated strings
could be offset incorrectly.

Also, the source-span assigned to an i18n message did not include leading
whitespace. This confused the output source-mappings so that the first text
nodes of the message stopped at the first non-whitespace character.

This commit makes use of the previous refactorings, where more fine grain
information was provided in text tokens, to enable the parser to identify
the location of the interpolations in the original source more accurately.

Fixes angular#41034

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
…2062)

These token interfaces will make it easier to reason about tokens in the
parser and in specs.

Previously, it was never clear what items could appear in the `parts`
array of a token given a particular `TokenType`. Now, each token interface
declares a labelled tuple for the parts, which helps to document the token
better.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
… interpolations (angular#42062)

Such interpolations turned up during internal testing at Google, so this
commit adds a test to prevent regressions.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
…lar#42062)

This is a simple tidy up commit to move to the more specific `===`
comparison operator in the HTML lexer/parser.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
…butes (angular#42062)

This tests a scenario that was failing in an internal project.

PR Close angular#42062
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
TeriGlover pushed a commit to TeriGlover/angular that referenced this pull request Sep 22, 2021
@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Oct 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
action: merge The PR is ready for merge by the caretaker area: compiler Issues related to `ngc`, Angular's template compiler cla: yes target: patch This PR is targeted for the next patch release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extract-i18n with HTML entities and interpolation
6 participants