-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(compiler): support interpolation and encoded entities when lexing markup #42062
refactor(compiler): support interpolation and encoded entities when lexing markup #42062
Conversation
049effb
to
c54d3dd
Compare
6f94cb4
to
ef6198e
Compare
67eab89
to
fd33222
Compare
aa344d9
to
bd5a087
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewed-for: fw-playground
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a bunch of comments. I would be curious to know how this would behave in TGP run, as that would give us a better feeling for how breaky this might be and/or which cases this doesn't handle correctly.
There's two typos in "refactor(compiler): support encoded entity tokens when lexing markup":
The lexer now splits encoded entity tokens out from text and attribute value tokens.
Previously encoded entities would be decoded andtheretheir decoded value would be
included as part of the text token of the surrounding text. Now the entities will have
their own tokens. There are two scenarios: text and attribute values.
The HTML parser has been modified to
recombinedrecombine these tokens to allow this
refactoring to have limited effect in this commit. Further refactorings
to use these new tokens will follow in subsequent commits.
@@ -1,6 +1,6 @@ | |||
<div class="zippy"> | |||
<div (click)="toggle()" class="zippy__title"> | |||
{{ visible ? '▾' : '▸' }} {{title}} | |||
{{ visible ? '\u25BE' : '\u25B8' }} {{title}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change also means this would technically be a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One person's bug fix is another person's breaking change... but perhaps you are right 😞
No doubt there will be loads of these in G3...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this was a refactoring commit. I wonder how I can get this BREAKING CHANGE into the change log...
Perhaps I should rename this commit to a fix
? Where I am "fixing" the fact that interpolated blocks should not be decoding HTML entities??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a commit that ensures this breaking change does not happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than make this change here, and revert it later, can we squash the fix commit (I'm guessing refactor(compiler): ensure that HTML entities in interpolations are decoded
) into this one?
In general I'm a huge proponent of commits being independent - merging only half the commits from this PR shouldn't leave the repo in a broken state, for example.
bd5a087
to
7bca035
Compare
7bca035
to
451e3bf
Compare
This import is not used in the file, so can be removed. PR Close angular#42062
…ts (angular#42062) The compliance tests can check source-map segments against expectations encoded into the expectation files. Previously, the encoding of the expected segment was only delimited by whitespace, but this made it difficult to identify segments that started or ended with whitespace. Now these segment expectations are wrapped in double-quotes which makes it easier to read and understand the expectation files. PR Close angular#42062
This commit removes 9 cycles in the dependency graph of the compiler code. PR Close angular#42062
…ngular#42062) The lexer now splits interpolation tokens out from text tokens. Previously the contents of `<div>Hello, {{ name}}<div>` would be a single text token. Now it will be three tokens: ``` TEXT: "Hello, " INTERPOLATION: "{{", " name", "}}" TEXT: "" ``` - INTERPOLATION tokens have three parts, "start marker", "expression" and "end marker". - INTERPOLATION tokens are always preceded and followed by TEXT tokens, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close angular#42062
This function is general purpose and by moving it into the `chars.ts` file along with similar helpers, it can be reused in the lexer, for instance. PR Close angular#42062
…e values (angular#42062) The lexer now splits interpolation tokens out from attribute value tokens. Previously the attribute value of `<div attr="Hello, {{ name}}">` would be a single token. Now it will be three tokens: ``` ATTR_VALUE_TEXT: "Hello, " ATTR_VALUE_INTERPOLATION: "{{", " name", "}}" ATTR_VALUE_TEXT: "" ``` - ATTR_VALUE_INTERPOLATION tokens have three parts, "start marker", "expression" and "end marker". - ATTR_VALUE_INTERPOLATION tokens are always preceded and followed by TEXT tokens, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close angular#42062
…ngular#42062) The lexer now splits encoded entity tokens out from text and attribute value tokens. Previously encoded entities would be decoded and the decoded value would be included as part of the text token of the surrounding text. Now the entities have their own tokens. There are two scenarios: text and attribute values. Previously the contents of `<div>Hello & goodbye</div>` would be a single TEXT token. Now it will be three tokens: ``` TEXT: "Hello " ENCODED_ENTITY: "&", "&" TEXT: " goodbye" ``` Previously the attribute value in `<div title="Hello & goodbye">` would be a single text token. Now it will be three tokens: ``` ATTR_VALUE_TEXT: "Hello " ENCODED_ENTITY: "&", "&" ATTR_VALUE_TEXT: " goodbye" ``` - ENCODED_ENTITY tokens have two parts: "decoded" and "encoded". - ENCODED_ENTITY tokens are always preceded and followed by either TEXT tokens or ATTR_VALUE_TEXT tokens, depending upon the context, even if they represent an empty string. The HTML parser has been modified to recombine these tokens to allow this refactoring to have limited effect in this commit. Further refactorings to use these new tokens will follow in subsequent commits. PR Close angular#42062
When it was tokenized, text content is split into parts that can include interpolations and encoded entities tokens. To make this information available to downstream processing, this commit adds these tokens to the `Text` AST nodes, with suitable processing. PR Close angular#42062
The tests were checking that the source-span of parsed HTML nodes were accurate, but they were not checking the span when it includes the "leading trivia", which are given by the `fullStart` rather than `start` location. PR Close angular#42062
…sages (angular#42062) Previously, the way templates were tokenized meant that we lost information about the location of interpolations if the template contained encoded HTML entities. This meant that the mapping back to the source interpolated strings could be offset incorrectly. Also, the source-span assigned to an i18n message did not include leading whitespace. This confused the output source-mappings so that the first text nodes of the message stopped at the first non-whitespace character. This commit makes use of the previous refactorings, where more fine grain information was provided in text tokens, to enable the parser to identify the location of the interpolations in the original source more accurately. Fixes angular#41034 PR Close angular#42062
…2062) These token interfaces will make it easier to reason about tokens in the parser and in specs. Previously, it was never clear what items could appear in the `parts` array of a token given a particular `TokenType`. Now, each token interface declares a labelled tuple for the parts, which helps to document the token better. PR Close angular#42062
… interpolations (angular#42062) Such interpolations turned up during internal testing at Google, so this commit adds a test to prevent regressions. PR Close angular#42062
…lar#42062) This is a simple tidy up commit to move to the more specific `===` comparison operator in the HTML lexer/parser. PR Close angular#42062
…butes (angular#42062) This tests a scenario that was failing in an internal project. PR Close angular#42062
…in attributes (angular#42062)" (angular#43033) This reverts commit fe12651. PR Close angular#43033
…er (angular#42062)" (angular#43033) This reverts commit 28b0c45. PR Close angular#43033
…rminated interpolations (angular#42062)" (angular#43033) This reverts commit 11ebe21. PR Close angular#43033
…ngular#42062)" (angular#43033) This reverts commit 9b3d4f5. PR Close angular#43033
…i18n messages (angular#42062)" (angular#43033) This reverts commit f08516d. PR Close angular#43033
angular#43033) This reverts commit 973f9b8. PR Close angular#43033
…#42062)" (angular#43033) This reverts commit 8a54896. PR Close angular#43033
… markup (angular#42062)" (angular#43033) This reverts commit 942b24d. PR Close angular#43033
…attribute values (angular#42062)" (angular#43033) This reverts commit c516e25. PR Close angular#43033
…#42062)" (angular#43033) This reverts commit 3d3b69f. PR Close angular#43033
…markup (angular#42062)" (angular#43033) This reverts commit c8a46bf. PR Close angular#43033
…" (angular#43033) This reverts commit 7585519. PR Close angular#43033
…ance tests (angular#42062)" (angular#43033) This reverts commit 29f9888. PR Close angular#43033
…r#43033) This reverts commit 42265cc. PR Close angular#43033
This issue has been automatically locked due to inactivity. Read more about our automatic conversation locking policy. This action has been performed automatically by a bot. |
The lexer now splits interpolation and encoded entity tokens out from the string it is tokenizing.
See individual commits...
Fixes #41034