Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Fix for atom/language-xml#91 and atom/language-xml#96 #101

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

vfcp
Copy link

@vfcp vfcp commented Oct 26, 2020

#96 already provides all the details about the issues fixed here.

#87 (comment) has the correct code but merge included some extra indent which causes the rule not to work properly.

In relation with #91, a begin without end or while was added but this is not valid as begin should always have a corresponding end or while. match should be used instead of begin

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • All new code requires tests to ensure against regressions

Description of the Change

The details are already discussed in detail in both issues #91 and #96.

Alternate Designs

N/A

Benefits

Fix invalid rules for JSP style comments, fixes #91

Possible Drawbacks

None

Applicable Issues

#91 #96

atom#96 already provides all the details about the issues fixed here.

atom#87 (comment) has the correct code but merge included some extra indent which causes the rule not to work properly.

In relation with atom#91, a `begin` without `end` or `while` was added but this is not valid as `begin` should always have a corresponding `end` or `while`.  `match` should be used instead of `begin`
Updated xml-spec.coffee for proposed solution
@Ingramz
Copy link

Ingramz commented Oct 26, 2020

Regardless of whether the changes here stay the same, I think it would be good overall if we had a couple of extra tests to better explain what kind of output we expect from snippets containing -- in comments. Right now the test only concerns the happy path where there exists --> after <!-- ... -- ..., but now we might also need to include some cases that also include highlightable code after -->; case where --> does not exist and the edge case including --->. It was somewhat (but not entirely) fine to omit these earlier, as the ouput was roughtly the same for anything past --, now they are different.

We can wait till the proposal in the corresponding issue gets finalized and then look at which tests exactly have to be implemented so that the tests do not need to be redone multiple times.

@darangi
Copy link
Contributor

darangi commented Oct 29, 2020

@vfcp could you add some tests?

Updated grammar as per latest solution discussed in atom#91
Added test to document the new behavior when invalid comments are found
Minor test correction
Minor test correction
@vfcp
Copy link
Author

vfcp commented Nov 2, 2020

I've added a test for <!-- invalid -- comment ---><n></n>--><n></n> which together with the already existing tests cover all cases mentioned by @Ingramz above.

Below is the test and the list of tokens generated for your review:

  it 'tokenizes after invalid comment only if comment was properly closed', ->
    {tokens} = grammar.tokenizeLine('<!-- invalid -- comment ---><n></n>--><n></n>')
    expect(tokens[0]).toEqual value: '<!--', scopes: ['text.xml', 'comment.block.xml', 'punctuation.definition.comment.xml']
    expect(tokens[1]).toEqual value: ' invalid ', scopes: ['text.xml', 'comment.block.xml']
    expect(tokens[2]).toEqual value: '--', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[3]).toEqual value: ' comment -', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[4]).toEqual value: '--', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[5]).toEqual value: '><n></n>', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[6]).toEqual value: '-->', scopes: ['text.xml', 'comment.block.xml', 'punctuation.definition.comment.xml']
    expect(tokens[7]).toEqual value: '<',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[8]).toEqual value: 'n',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'entity.name.tag.xml', 'entity.name.tag.localname.xml']
    expect(tokens[9]).toEqual value: '>',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[10]).toEqual value: '</', scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[11]).toEqual value: 'n',  scopes: ['text.xml', 'meta.tag.no-content.xml', 'entity.name.tag.xml', 'entity.name.tag.localname.xml']
    expect(tokens[12]).toEqual value: '>',  scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']

@Ingramz
Copy link

Ingramz commented Nov 4, 2020

The tests seem sufficient for now, however I (and probably also you) noticed that becuase of the (?=-->) it stops everwhere inside the invalid comment where --> is encountered and starts a new token. Would it be fine for you to change that matcher to (?<!-)(?=-->) and update tests so that contiguous parts on single line get matched as one large token instead of many small ones?

Update invalid comment end regexp to reduce generated tokens

Added test case for empty XML comment `<!---->`
@vfcp
Copy link
Author

vfcp commented Nov 5, 2020

While testing your suggestion to use (?<!-)(?=-->) I've noticed that some of the changes we made caused empty XML comments like <!----> not to be parsed correctly.

To summarize, this is the current rule. It's basically the same rule as before PR with addiction of 'end': '(?<!-)(?=-->)' to break invalid comment scope in case proper --> is found:

      {
        'begin': '<!--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
        'end': '-->'
        'name': 'comment.block.xml'
        'patterns': [
          {
            'begin': '--(?!>)'
            'end': '(?<!-)(?=-->)'
            'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }
        ]
      }

List of tokens produced:

  it 'tokenizes after invalid comment only if comment was properly closed', ->
    {tokens} = grammar.tokenizeLine('<!-- invalid -- comment ---><n></n>--><n></n>')
    expect(tokens[0]).toEqual value: '<!--', scopes: ['text.xml', 'comment.block.xml', 'punctuation.definition.comment.xml']
    expect(tokens[1]).toEqual value: ' invalid ', scopes: ['text.xml', 'comment.block.xml']
    expect(tokens[2]).toEqual value: '--', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[3]).toEqual value: ' comment ---><n></n>', scopes: ['text.xml', 'comment.block.xml', 'invalid.illegal.bad-comments-or-CDATA.xml']
    expect(tokens[4]).toEqual value: '-->', scopes: ['text.xml', 'comment.block.xml', 'punctuation.definition.comment.xml']
    expect(tokens[5]).toEqual value: '<',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[6]).toEqual value: 'n',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'entity.name.tag.xml', 'entity.name.tag.localname.xml']
    expect(tokens[7]).toEqual value: '>',   scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[8]).toEqual value: '</', scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']
    expect(tokens[9]).toEqual value: 'n',  scopes: ['text.xml', 'meta.tag.no-content.xml', 'entity.name.tag.xml', 'entity.name.tag.localname.xml']
    expect(tokens[10]).toEqual value: '>',  scopes: ['text.xml', 'meta.tag.no-content.xml', 'punctuation.definition.tag.xml']

I've also added an additional test for the <!----> scenario which might be useful in future changes.

@Ingramz
Copy link

Ingramz commented Nov 5, 2020

Thank you for pointing this out, I'm glad at least we got one extra test out of it. I know for certain that we can do better, but at this point I'm not sure how easy it would be. We have made it clear enough that it's unintentional that multiple tokens are generated, but also realize that nobody should be impacted at least right now (file an issue report or a pull request if you are). The test suite should support now enough cases to not break it in a dumb manner in the future.

@darangi darangi added the triaged label Nov 5, 2020
@Ingramz
Copy link

Ingramz commented Nov 5, 2020

@vfcp actually I think this is simpler than I thought before, we could move --(?!>) into a lookahead: (?=--(?!>)), that should always create at most one token.

Updated invalid comment begin rule to reduce number of tokens generated
@vfcp
Copy link
Author

vfcp commented Nov 8, 2020

@Ingramz Updated rule and tests as suggested.

I've tested in Linguist and VSCode as well to make sure behavior is same as expected.

@Ingramz
Copy link

Ingramz commented Nov 8, 2020

Excellent!

@darangi it should be now good to review/merge.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XML syntax highlighting fails when double hyphens appear inside multi-line comments
3 participants