XML syntax highlighting fails when double hyphens appear inside multi-line comments #91

aeschli · 2018-09-12T09:20:30Z

From @tommai78101 on September 6, 2018 14:13

System Specs

Version: 1.27.0 (user setup)
Commit: 7b9afc4196bda60b0facdf62cfc02815509b23d6
Date: 2018-09-05T05:29:44.670Z
Electron: 2.0.7
Chrome: 61.0.3163.100
Node.js: 8.9.3
V8: 6.1.534.41
Architecture: x64
OS: Windows 7 LTSB

Description

When you put double hyphens (--) inside multi-line comment blocks in XML files, it breaks the syntax highlighting.

Opinion

Per the following information:

[Definition: Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string " -- " (double-hyphen) must not occur within comments.] Parameter entity references must not be recognized within comments.

http://www.w3.org/TR/REC-xml/#sec-comments

This is correct indication or correct response to an invalid -- inside XML comments, however, it should only highlight the offending -- in red, but keep the rest of the code with correct syntax highlighting, so it doesn't highlight the remaining code after the offending -- to be green entirely.

Steps to Reproduce:

Copy paste the following XML code in a new XML document in Visual Studio Code:

<?xml version="1.0" encoding="utf-8"?>
<Foo>
	<Class>
		<!-- This is a comment -->
		<Bar>Hello world</Bar>
		<!-- 
			This is a multiple-lines comment.
			Hello comment!
		-->
		<Baz>Goodbye world.</Baz>
		<!-- 
			This is a multiple-lines comment with double hyphens inside it.
			--
		-->
		<Error>Highlighting parsing error</Error>
	</Class>
</Foo>

Observe syntax highlighting breaks after "Goodbye World.".

Image

Does this issue occur when all extensions are disabled?: Yes
Does this issue occur when one of the extensions is disabled?: Yes

XML Tools 2.3.2

Copied from original issue: microsoft/vscode#58076

The text was updated successfully, but these errors were encountered:

Aerijo · 2018-09-12T11:14:19Z

@tommai78101 https://stackoverflow.com/questions/10842131/xml-comments-and

I don't know XML, but I'm guessing this is intentional.

tommai78101 · 2018-09-12T13:21:01Z

Yeah, I knew about the Stack Overflow, and how it's already in the Opinion section in OP, but I don't know about the last part where leaving out the 'end' parameter is intentional.

steventango · 2018-10-26T04:29:39Z

I worked on PR #87 that caused this behavior, it was intentional at the time, but in hindsight I agree with @tommai78101's opinion. However, last time @Ingramz's opinion was that "From the point where the error is first registered, the rest of the document will be invalid." So perhaps we should come to an agreement on what is the best way to do this before I open another PR.

Ingramz · 2018-10-30T10:57:12Z

The end part is left out intentionally so that it could not break out of the rule anymore. In theory this should make it easier to locate the source of error.

If there is a good reason why it should be changed or a reasonable use case that makes the developer experience more comfortable, we can always make the necessary adjustments.

atom#96 already provides all the details about the issues fixed here. atom#87 (comment) has the correct code but merge included some extra indent which causes the rule not to work properly. In relation with atom#91, a `begin` without `end` or `while` was added but this is not valid as `begin` should always have a corresponding `end` or `while`. `match` should be used instead of `begin`

Updated xml-spec.coffee for proposed solution

vfcp · 2020-10-26T11:40:57Z

I've created #101 to address this issue and also #96

I understand the comment above from @Ingramz and the intention to increase visibility on invalid comment is good but current approach blocks highlighting of other parts that are valid or detection of other issues further in the document.

Similar to compilers and other grammars, attempting to recover from error and continue parsing translates to better developer experience other than going though all possible issues one by one.

A possible midway approach could be to tokeninze with invalid.illegal.bad-comments-or-CDATA.xml but only until the next --> is found.

Ingramz · 2020-10-26T13:53:20Z

My understanding of the problem we are trying to solve now is that people get frustrated when they accidentally leave or type in -- which ruins the highlighting for the rest of the file, so it becomes impossible to work on until that segment is eliminated. If this is the case, we should mention this explicitly before committing to a change. Otherwise we are just blindly catering to someones opinion, which is frustrating if someone else comes with a different opinion later on. Please do correct me if I am mistaken.

Now that we have something to solve, I think we should go with the midway approach such that erroneous parts shall not be attempted to highlight correctly at all as proposed by @vfcp. Terminating the comment at the next valid --> seems like a reasonable tradeoff to me. Also consider cases containing --->, in which case I think it is important that we do not stop the comment there as it is cannot be part of a valid comment either.

If someone else has any thoughts on this, feel free to provide your take on this.

vfcp · 2020-10-30T23:19:27Z

So, for the midway approach, I'm proposing adding an end regexp like this (^|[^-])(?=-->)

Final rule would be:

      {
        'begin': '<!--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
        'end': '-->'
        'name': 'comment.block.xml'
        'patterns': [
          {
            'begin': '--(?!>)'
            'end': '(^|[^-])(?=-->)'
            'captures':
              '0':
                'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }
        ]
      }

I'll test this in linguist, vscode's textmate and atom's firstmate before updating my PR.

Ingramz · 2020-11-01T22:41:59Z

@vfcp in general it looks good. The captures section in the subpattern probably doesn't do what you want, move the name to the same level as begin / end instead. You'll probably notice it when you test it yourself.

While investigating this, I came up with a slightly different approach to capture the comments:

      {
        'begin': '<!--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
        'end': '(?<!-)-->'
        'name': 'comment.block.xml'
        'patterns': [
          {
            'begin': '--'
            'end': '(?=-->)'
            'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }
        ]
      }

It should be doing the same thing, but it feels a little cleaner to me. Let me know what you think.

Updated grammar as per latest solution discussed in atom#91 Added test to document the new behavior when invalid comments are found

Minor test correction

vfcp · 2020-11-02T14:59:25Z

@Ingramz Your solution works much better with Linguist highlighter and it's also much easier to understand and maintain.

The ---> special case is handled on the correct level and the invalid comment begin/end rules reflect exactly the behavior we are trying to achieve.

I've updated the PR #101 code and tests accordingly.

Thanks a lot for your feedback!

Update invalid comment end regexp to reduce generated tokens Added test case for empty XML comment ``

Updated invalid comment begin rule to reduce number of tokens generated

aeschli mentioned this issue Sep 12, 2018

XML syntax highlighting fails when double hyphens appear inside multi-line comments microsoft/vscode#58076

Closed

rsese mentioned this issue Jan 25, 2019

XML syntax definition problem #96

Closed

tobbi mentioned this issue Aug 4, 2019

XML comments inside other XML comments cause the rest of the document to be highlighted as a comment microsoft/vscode#78442

Closed

vfcp linked a pull request Oct 26, 2020 that will close this issue

Fix for atom/language-xml#91 and atom/language-xml#96 #101

Open

vfcp added a commit to vfcp/language-xml that referenced this issue Oct 26, 2020

Fix for atom#91 and atom#96

0e5c951

Updated xml-spec.coffee for proposed solution

vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020

Fix for atom#91 and atom#96

6086fe0

Updated grammar as per latest solution discussed in atom#91 Added test to document the new behavior when invalid comments are found

vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020

Fix for atom#91 and atom#96

f42aa7e

Minor test correction

vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020

Fix for atom#91 and atom#96

c293b84

Minor test correction

vfcp added a commit to vfcp/language-xml that referenced this issue Nov 5, 2020

Fix for atom#91 and atom#96

15d1f0a

Update invalid comment end regexp to reduce generated tokens Added test case for empty XML comment ``

vfcp added a commit to vfcp/language-xml that referenced this issue Nov 8, 2020

Fix for atom#91 and atom#96

60fc36f

Updated invalid comment begin rule to reduce number of tokens generated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XML syntax highlighting fails when double hyphens appear inside multi-line comments #91

XML syntax highlighting fails when double hyphens appear inside multi-line comments #91

aeschli commented Sep 12, 2018

Aerijo commented Sep 12, 2018 •

edited

Loading

tommai78101 commented Sep 12, 2018

steventango commented Oct 26, 2018 •

edited

Loading

Ingramz commented Oct 30, 2018

vfcp commented Oct 26, 2020 •

edited

Loading

Ingramz commented Oct 26, 2020 •

edited

Loading

vfcp commented Oct 30, 2020

Ingramz commented Nov 1, 2020

vfcp commented Nov 2, 2020

XML syntax highlighting fails when double hyphens appear inside multi-line comments #91

XML syntax highlighting fails when double hyphens appear inside multi-line comments #91

Comments

aeschli commented Sep 12, 2018

System Specs

Description

Opinion

Steps to Reproduce:

Image

Aerijo commented Sep 12, 2018 • edited Loading

tommai78101 commented Sep 12, 2018

steventango commented Oct 26, 2018 • edited Loading

Ingramz commented Oct 30, 2018

vfcp commented Oct 26, 2020 • edited Loading

Ingramz commented Oct 26, 2020 • edited Loading

vfcp commented Oct 30, 2020

Ingramz commented Nov 1, 2020

vfcp commented Nov 2, 2020

Aerijo commented Sep 12, 2018 •

edited

Loading

steventango commented Oct 26, 2018 •

edited

Loading

vfcp commented Oct 26, 2020 •

edited

Loading

Ingramz commented Oct 26, 2020 •

edited

Loading