Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraneous whitespace in expression ASTs #33477

Open
ayazhafiz opened this issue Oct 29, 2019 · 1 comment 路 May be fixed by #34690
Open

Extraneous whitespace in expression ASTs #33477

ayazhafiz opened this issue Oct 29, 2019 · 1 comment 路 May be fixed by #34690

Comments

@ayazhafiz
Copy link
Member

@ayazhafiz ayazhafiz commented Oct 29, 2019

馃悶 bug report

Affected Package

@angular/compiler

Is this a regression?

No

Description

The expression parser leaves trailing whitespace when generating some expressions ASTs. This is already a known issue:

it('should provide absolute offsets of expressions in a binary expression', () => {
expect(humanizeExpressionSource(parse('<div>{{1 + 2}}<div>').nodes))
.toEqual(jasmine.arrayContaining([
// TODO(ayazhafiz): The expression parser includes an extra whitespace on a expressions
// with trailing whitespace in a binary expression. Look into fixing this.
['1', new AbsoluteSourceSpan(7, 9)],
['2', new AbsoluteSourceSpan(11, 12)],
]));
});

This extra whitespace should probably be stripped so that e.g. diagnostic messages do not have trailing whitespace. See angular/vscode-ng-language-service#433 for an example of this affecting the language service.

This is unrelated to #31898.

馃敩 Minimal Reproduction

it('should provide absolute offsets of expressions in a binary expression', () => {
expect(humanizeExpressionSource(parse('<div>{{1 + 2}}<div>').nodes))
.toEqual(jasmine.arrayContaining([
// TODO(ayazhafiz): The expression parser includes an extra whitespace on a expressions
// with trailing whitespace in a binary expression. Look into fixing this.
['1', new AbsoluteSourceSpan(7, 9)],
['2', new AbsoluteSourceSpan(11, 12)],
]));
});

馃敟 Exception or Error

N/A

馃實 Your Environment

Angular Version:

9.0.0-next.*

Anything else relevant?

N/A

@ayazhafiz

This comment has been minimized.

Copy link
Member Author

@ayazhafiz ayazhafiz commented Oct 30, 2019

Taking a quick look, this should be fairly easy to fix by modifying the span() method of the expression parser to mark the end of an expression span using the current expression rather than using the start of the next expression:

get inputIndex(): number {
return (this.index < this.tokens.length) ? this.next.index + this.offset :
this.inputLength + this.offset;
}
span(start: number) { return new ParseSpan(start, this.inputIndex); }

Maybe by adding a new field on a token to mark its end:

export class Token {
constructor(
public index: number, public type: TokenType, public numValue: number,
public strValue: string) {}

ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Nov 2, 2019
In the past, only the starting index of an expression Token has been
recorded, so a parser could demarkate the span of a token only by the
start locations of two tokens. This may lead to trailing whitespace
being included in the token span:

```html
{{ token1   + token2 }}
   ^^^^^^^^^             recorded span of `token1`
```

It's also not enough for a parser to determine the end of a token by
adding the length of the token value to the token's start location,
because lexed expression values may not exactly reflect the source code.
For example, `"d\\"e"` is lexed as a string token whose value is `d"e`.

Instead, this commit adds a `end` field to expression tokens. `end`
is one past the last index of the token source code. This will enable a
parser to determine the span of a token just by looking at that token.

This is a breaking change because the contructor interface of `Token`
has changed.

Part of angular#33477.
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 4, 2019
In the past, only the starting index of an expression Token has been
recorded, so a parser could demarkate the span of a token only by the
start locations of two tokens. This may lead to trailing whitespace
being included in the token span:

```html
{{ token1   + token2 }}
   ^^^^^^^^^             recorded span of `token1`
```

It's also not enough for a parser to determine the end of a token by
adding the length of the token value to the token's start location,
because lexed expression values may not exactly reflect the source code.
For example, `"d\\"e"` is lexed as a string token whose value is `d"e`.

Instead, this commit adds a `end` field to expression tokens. `end`
is one past the last index of the token source code. This will enable a
parser to determine the span of a token just by looking at that token.

This is a breaking change because the contructor interface of `Token`
has changed.

Part of angular#33477.
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 19, 2019
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
and not the end index of the current token.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 19, 2019
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
and not the end index of the current token.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 19, 2019
In the past, only the starting index of an expression Token has been
recorded, so a parser could demarkate the span of a token only by the
start locations of two tokens. This may lead to trailing whitespace
being included in the token span:

```html
{{ token1   + token2 }}
   ^^^^^^^^^             recorded span of `token1`
```

It's also not enough for a parser to determine the end of a token by
adding the length of the token value to the token's start location,
because lexed expression values may not exactly reflect the source code.
For example, `"d\\"e"` is lexed as a string token whose value is `d"e`.

Instead, this commit adds a `end` field to expression tokens. `end`
is one past the last index of the token source code. This will enable a
parser to determine the span of a token just by looking at that token.

This is a breaking change because the contructor interface of `Token`
has changed.

Part of angular#33477.
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 19, 2019
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
and not the end index of the current token.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 19, 2019
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
and not the end index of the current token.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 19, 2019
In the past, only the starting index of an expression Token has been
recorded, so a parser could demarkate the span of a token only by the
start locations of two tokens. This may lead to trailing whitespace
being included in the token span:

```html
{{ token1   + token2 }}
   ^^^^^^^^^             recorded span of `token1`
```

It's also not enough for a parser to determine the end of a token by
adding the length of the token value to the token's start location,
because lexed expression values may not exactly reflect the source code.
For example, `"d\\"e"` is lexed as a string token whose value is `d"e`.

Instead, this commit adds a `end` field to expression tokens. `end`
is one past the last index of the token source code. This will enable a
parser to determine the span of a token just by looking at that token.

This is a breaking change because the contructor interface of `Token`
has changed.

Part of angular#33477.
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 19, 2019
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
and not the end index of the current token.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 20, 2019
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
not the end index of the current token.

In the monorepo, only the language service relied on the previous
functionality. To derive completion suggestions for expressions in a
bound attribute, the language service finds the expression AST a
cursor is in. With AST spans bound by individual tokens, this no longer
works because the AST spans may not encompass the entire attribute source
code.

As a solution, the parsed AST has its spans stretched so that the end of
each span aligns with the start of the next (where applicable). These
stretched spans are used to grab the AST on the LHS of the cursor
position.

```typescript
         v         cursor position
let n of | nums
^-- ^ ^-   ^---    ast spans
^---^-^----^---    stretched spans
```

This solution was not pulled out as a utility function with independent
tests to keep the scope and diff of this commit smaller. A future commit
and PR will provide a fixup to the implementation.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 20, 2019
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
not the end index of the current token.

In the monorepo, only the language service relied on the previous
functionality. To derive completion suggestions for expressions in a
bound attribute, the language service finds the expression AST a
cursor is in. With AST spans bound by individual tokens, this no longer
works because the AST spans may not encompass the entire attribute source
code.

As a solution, the parsed AST has its spans stretched so that the end of
each span aligns with the start of the next (where applicable). These
stretched spans are used to grab the AST on the LHS of the cursor
position.

```typescript
         v         cursor position
let n of | nums
^-- ^ ^-   ^---    ast spans
^---^-^----^---    stretched spans
```

This solution was not pulled out as a utility function with independent
tests to keep the scope and diff of this commit smaller. A future commit
and PR will provide a fixup to the implementation.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Dec 20, 2019
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
not the end index of the current token.

In the monorepo, only the language service relied on the previous
functionality. To derive completion suggestions for expressions in a
bound attribute, the language service finds the expression AST a
cursor is in. With AST spans bound by individual tokens, this no longer
works because the AST spans may not encompass the entire attribute source
code.

As a solution, the parsed AST has its spans stretched so that the end of
each span aligns with the start of the next (where applicable). These
stretched spans are used to grab the AST on the LHS of the cursor
position.

```typescript
         v         cursor position
let n of | nums
^-- ^ ^-   ^---    ast spans
^---^-^----^---    stretched spans
```

This solution was not pulled out as a utility function with independent
tests to keep the scope and diff of this commit smaller. A future commit
and PR will provide a fixup to the implementation.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
alxhub added a commit that referenced this issue Jan 6, 2020
In the past, only the starting index of an expression Token has been
recorded, so a parser could demarkate the span of a token only by the
start locations of two tokens. This may lead to trailing whitespace
being included in the token span:

```html
{{ token1   + token2 }}
   ^^^^^^^^^             recorded span of `token1`
```

It's also not enough for a parser to determine the end of a token by
adding the length of the token value to the token's start location,
because lexed expression values may not exactly reflect the source code.
For example, `"d\\"e"` is lexed as a string token whose value is `d"e`.

Instead, this commit adds a `end` field to expression tokens. `end`
is one past the last index of the token source code. This will enable a
parser to determine the span of a token just by looking at that token.

This is a breaking change because the contructor interface of `Token`
has changed.

Part of #33477.

PR Close #33549
alxhub added a commit that referenced this issue Jan 6, 2020
In the past, only the starting index of an expression Token has been
recorded, so a parser could demarkate the span of a token only by the
start locations of two tokens. This may lead to trailing whitespace
being included in the token span:

```html
{{ token1   + token2 }}
   ^^^^^^^^^             recorded span of `token1`
```

It's also not enough for a parser to determine the end of a token by
adding the length of the token value to the token's start location,
because lexed expression values may not exactly reflect the source code.
For example, `"d\\"e"` is lexed as a string token whose value is `d"e`.

Instead, this commit adds a `end` field to expression tokens. `end`
is one past the last index of the token source code. This will enable a
parser to determine the span of a token just by looking at that token.

This is a breaking change because the contructor interface of `Token`
has changed.

Part of #33477.

PR Close #33549
ayazhafiz added a commit to ayazhafiz/angular that referenced this issue Jan 9, 2020
This commit fixes a bug with the expression parser wherein the end index
of an expression node was recorded as the start index of the next token,
not the end index of the current token.

In the monorepo, only the language service relied on the previous
functionality. To derive completion suggestions for expressions in a
bound attribute, the language service finds the expression AST a
cursor is in. With AST spans bound by individual tokens, this no longer
works because the AST spans may not encompass the entire attribute source
code.

As a solution, the parsed AST has its spans stretched so that the end of
each span aligns with the start of the next (where applicable). These
stretched spans are used to grab the AST on the LHS of the cursor
position.

```typescript
         v         cursor position
let n of | nums
^-- ^ ^-   ^---    ast spans
^---^-^----^---    stretched spans
```

This solution was not pulled out as a utility function with independent
tests to keep the scope and diff of this commit smaller. A future commit
and PR will provide a fixup to the implementation.

Closes angular#33477
Closes angular/vscode-ng-language-service#433
@ayazhafiz ayazhafiz linked a pull request that will close this issue Jan 9, 2020
4 of 5 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can鈥檛 perform that action at this time.