Fix for atom/language-xml#91 and atom/language-xml#96 #101

vfcp · 2020-10-26T10:42:55Z

#96 already provides all the details about the issues fixed here.

#87 (comment) has the correct code but merge included some extra indent which causes the rule not to work properly.

In relation with #91, a begin without end or while was added but this is not valid as begin should always have a corresponding end or while. match should be used instead of begin

Requirements

Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
All new code requires tests to ensure against regressions

Description of the Change

The details are already discussed in detail in both issues #91 and #96.

Alternate Designs

N/A

Benefits

Fix invalid rules for JSP style comments, fixes #91

Possible Drawbacks

None

Applicable Issues

#91 #96

atom#96 already provides all the details about the issues fixed here. atom#87 (comment) has the correct code but merge included some extra indent which causes the rule not to work properly. In relation with atom#91, a `begin` without `end` or `while` was added but this is not valid as `begin` should always have a corresponding `end` or `while`. `match` should be used instead of `begin`

Updated xml-spec.coffee for proposed solution

Ingramz · 2020-10-26T14:37:29Z

Regardless of whether the changes here stay the same, I think it would be good overall if we had a couple of extra tests to better explain what kind of output we expect from snippets containing -- in comments. Right now the test only concerns the happy path where there exists --> after ; case where --> does not exist and the edge case including --->. It was somewhat (but not entirely) fine to omit these earlier, as the ouput was roughtly the same for anything past --, now they are different.

We can wait till the proposal in the corresponding issue gets finalized and then look at which tests exactly have to be implemented so that the tests do not need to be redone multiple times.

darangi · 2020-10-29T18:50:53Z

@vfcp could you add some tests?

Updated grammar as per latest solution discussed in atom#91 Added test to document the new behavior when invalid comments are found

Minor test correction

vfcp · 2020-11-02T15:08:02Z

I've added a test for <n></n>--><n></n> which together with the already existing tests cover all cases mentioned by @Ingramz above.

Below is the test and the list of tokens generated for your review:

  it 'tokenizes after invalid comment only if comment was properly closed', ->
    {tokens} = grammar.tokenizeLine('<!-- invalid -- comment ---><n></n>--><n></n>')
    expect(tokens[0]).toEqual value: '<!--', scopes: ['text.xml', 'comment.block.xml', 'punctuation.definition.comment.xml']
    expect(tokens[1]).toEqual value: ' invalid ', scopes: ['text.xml', 'comment.block.xml']
    expect(tokens[2]).toEqual value: '--', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[3]).toEqual value: ' comment -', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[4]).toEqual value: '--', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[5]).toEqual value: '><n></n>', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[6]).toEqual value: '-->', scopes: ['text.xml', 'comment.block.xml', 'punctuation.definition.comment.xml']
    expect(tokens[7]).toEqual value: '<',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[8]).toEqual value: 'n',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'entity.name.tag.xml', 'entity.name.tag.localname.xml']
    expect(tokens[9]).toEqual value: '>',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[10]).toEqual value: '</', scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[11]).toEqual value: 'n',  scopes: ['text.xml', 'meta.tag.no-content.xml', 'entity.name.tag.xml', 'entity.name.tag.localname.xml']
    expect(tokens[12]).toEqual value: '>',  scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']

Ingramz · 2020-11-04T13:36:41Z

The tests seem sufficient for now, however I (and probably also you) noticed that becuase of the (?=-->) it stops everwhere inside the invalid comment where --> is encountered and starts a new token. Would it be fine for you to change that matcher to (?<!-)(?=-->) and update tests so that contiguous parts on single line get matched as one large token instead of many small ones?

Update invalid comment end regexp to reduce generated tokens Added test case for empty XML comment ``

vfcp · 2020-11-05T09:36:23Z

While testing your suggestion to use (?<!-)(?=-->) I've noticed that some of the changes we made caused empty XML comments like  not to be parsed correctly.

To summarize, this is the current rule. It's basically the same rule as before PR with addiction of 'end': '(?<!-)(?=-->)' to break invalid comment scope in case proper --> is found:

      {
        'begin': '<!--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
        'end': '-->'
        'name': 'comment.block.xml'
        'patterns': [
          {
            'begin': '--(?!>)'
            'end': '(?<!-)(?=-->)'
            'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }
        ]
      }

List of tokens produced:

  it 'tokenizes after invalid comment only if comment was properly closed', ->
    {tokens} = grammar.tokenizeLine('<!-- invalid -- comment ---><n></n>--><n></n>')
    expect(tokens[0]).toEqual value: '<!--', scopes: ['text.xml', 'comment.block.xml', 'punctuation.definition.comment.xml']
    expect(tokens[1]).toEqual value: ' invalid ', scopes: ['text.xml', 'comment.block.xml']
    expect(tokens[2]).toEqual value: '--', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[3]).toEqual value: ' comment ---><n></n>', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[4]).toEqual value: '-->', scopes: ['text.xml', 'comment.block.xml', 'punctuation.definition.comment.xml']
    expect(tokens[5]).toEqual value: '<',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[6]).toEqual value: 'n',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'entity.name.tag.xml', 'entity.name.tag.localname.xml']
    expect(tokens[7]).toEqual value: '>',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[8]).toEqual value: '</', scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[9]).toEqual value: 'n',  scopes: ['text.xml', 'meta.tag.no-content.xml', 'entity.name.tag.xml', 'entity.name.tag.localname.xml']
    expect(tokens[10]).toEqual value: '>',  scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']

I've also added an additional test for the  scenario which might be useful in future changes.

Ingramz · 2020-11-05T11:03:45Z

Thank you for pointing this out, I'm glad at least we got one extra test out of it. I know for certain that we can do better, but at this point I'm not sure how easy it would be. We have made it clear enough that it's unintentional that multiple tokens are generated, but also realize that nobody should be impacted at least right now (file an issue report or a pull request if you are). The test suite should support now enough cases to not break it in a dumb manner in the future.

Ingramz · 2020-11-05T23:45:19Z

@vfcp actually I think this is simpler than I thought before, we could move --(?!>) into a lookahead: (?=--(?!>)), that should always create at most one token.

Updated invalid comment begin rule to reduce number of tokens generated

vfcp · 2020-11-08T11:03:58Z

@Ingramz Updated rule and tests as suggested.

I've tested in Linguist and VSCode as well to make sure behavior is same as expected.

Ingramz · 2020-11-08T21:17:45Z

Excellent!

@darangi it should be now good to review/merge.

vfcp added 2 commits October 26, 2020 14:41

Fix for atom#91 and atom#96

0e5c951

Updated xml-spec.coffee for proposed solution

vfcp mentioned this pull request Oct 26, 2020

XML syntax highlighting fails when double hyphens appear inside multi-line comments #91

Open

darangi added the needs-testing label Oct 29, 2020

vfcp added 3 commits November 2, 2020 18:29

Fix for atom#91 and atom#96

6086fe0

Updated grammar as per latest solution discussed in atom#91 Added test to document the new behavior when invalid comments are found

Fix for atom#91 and atom#96

f42aa7e

Minor test correction

Fix for atom#91 and atom#96

c293b84

Minor test correction

Fix for atom#91 and atom#96

15d1f0a

Update invalid comment end regexp to reduce generated tokens Added test case for empty XML comment ``

Ingramz approved these changes Nov 5, 2020

View reviewed changes

darangi added the triaged label Nov 5, 2020

Fix for atom#91 and atom#96

60fc36f

Updated invalid comment begin rule to reduce number of tokens generated

Merge branch 'atom:master' into master

a47b650

tobil4sk mentioned this pull request Mar 6, 2023

Consider adopting a new XML grammar: XML comment bug due to deprecated atom/language-xml grammar microsoft/vscode#176193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for atom/language-xml#91 and atom/language-xml#96 #101

Fix for atom/language-xml#91 and atom/language-xml#96 #101

vfcp commented Oct 26, 2020 •

edited

Loading

Ingramz commented Oct 26, 2020

darangi commented Oct 29, 2020

vfcp commented Nov 2, 2020

Ingramz commented Nov 4, 2020

vfcp commented Nov 5, 2020

Ingramz commented Nov 5, 2020

Ingramz commented Nov 5, 2020

vfcp commented Nov 8, 2020

Ingramz commented Nov 8, 2020

Fix for atom/language-xml#91 and atom/language-xml#96 #101

Are you sure you want to change the base?

Fix for atom/language-xml#91 and atom/language-xml#96 #101

Conversation

vfcp commented Oct 26, 2020 • edited Loading

Requirements

Description of the Change

Alternate Designs

Benefits

Possible Drawbacks

Applicable Issues

Ingramz commented Oct 26, 2020

darangi commented Oct 29, 2020

vfcp commented Nov 2, 2020

Ingramz commented Nov 4, 2020

vfcp commented Nov 5, 2020

Ingramz commented Nov 5, 2020

Ingramz commented Nov 5, 2020

vfcp commented Nov 8, 2020

Ingramz commented Nov 8, 2020

vfcp commented Oct 26, 2020 •

edited

Loading