Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

XML syntax highlighting fails when double hyphens appear inside multi-line comments #91

Open
aeschli opened this issue Sep 12, 2018 · 9 comments · May be fixed by #101
Open

XML syntax highlighting fails when double hyphens appear inside multi-line comments #91

aeschli opened this issue Sep 12, 2018 · 9 comments · May be fixed by #101

Comments

@aeschli
Copy link

aeschli commented Sep 12, 2018

From @tommai78101 on September 6, 2018 14:13

System Specs

  • Version: 1.27.0 (user setup)
  • Commit: 7b9afc4196bda60b0facdf62cfc02815509b23d6
  • Date: 2018-09-05T05:29:44.670Z
  • Electron: 2.0.7
  • Chrome: 61.0.3163.100
  • Node.js: 8.9.3
  • V8: 6.1.534.41
  • Architecture: x64
  • OS: Windows 7 LTSB

Description

When you put double hyphens (--) inside multi-line comment blocks in XML files, it breaks the syntax highlighting.

Opinion

Per the following information:

[Definition: Comments may appear anywhere in a document outside other markup; in addition, they may appear within the document type declaration at places allowed by the grammar. They are not part of the document's character data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string " -- " (double-hyphen) must not occur within comments.] Parameter entity references must not be recognized within comments.

http://www.w3.org/TR/REC-xml/#sec-comments

This is correct indication or correct response to an invalid -- inside XML comments, however, it should only highlight the offending -- in red, but keep the rest of the code with correct syntax highlighting, so it doesn't highlight the remaining code after the offending -- to be green entirely.

Steps to Reproduce:

  1. Copy paste the following XML code in a new XML document in Visual Studio Code:
<?xml version="1.0" encoding="utf-8"?>
<Foo>
	<Class>
		<!-- This is a comment -->
		<Bar>Hello world</Bar>
		<!-- 
			This is a multiple-lines comment.
			Hello comment!
		-->
		<Baz>Goodbye world.</Baz>
		<!-- 
			This is a multiple-lines comment with double hyphens inside it.
			--
		-->
		<Error>Highlighting parsing error</Error>
	</Class>
</Foo>
  1. Observe syntax highlighting breaks after "Goodbye World.".

Image

image

Does this issue occur when all extensions are disabled?: Yes
Does this issue occur when one of the extensions is disabled?: Yes

  • XML Tools 2.3.2

Copied from original issue: microsoft/vscode#58076

@Aerijo
Copy link

Aerijo commented Sep 12, 2018

@tommai78101
Copy link

Yeah, I knew about the Stack Overflow, and how it's already in the Opinion section in OP, but I don't know about the last part where leaving out the 'end' parameter is intentional.

@steventango
Copy link
Contributor

steventango commented Oct 26, 2018

I worked on PR #87 that caused this behavior, it was intentional at the time, but in hindsight I agree with @tommai78101's opinion. However, last time @Ingramz's opinion was that "From the point where the error is first registered, the rest of the document will be invalid." So perhaps we should come to an agreement on what is the best way to do this before I open another PR.

@Ingramz
Copy link

Ingramz commented Oct 30, 2018

The end part is left out intentionally so that it could not break out of the rule anymore. In theory this should make it easier to locate the source of error.

If there is a good reason why it should be changed or a reasonable use case that makes the developer experience more comfortable, we can always make the necessary adjustments.

vfcp added a commit to vfcp/language-xml that referenced this issue Oct 26, 2020
atom#96 already provides all the details about the issues fixed here.

atom#87 (comment) has the correct code but merge included some extra indent which causes the rule not to work properly.

In relation with atom#91, a `begin` without `end` or `while` was added but this is not valid as `begin` should always have a corresponding `end` or `while`.  `match` should be used instead of `begin`
@vfcp vfcp linked a pull request Oct 26, 2020 that will close this issue
vfcp added a commit to vfcp/language-xml that referenced this issue Oct 26, 2020
Updated xml-spec.coffee for proposed solution
@vfcp
Copy link

vfcp commented Oct 26, 2020

I've created #101 to address this issue and also #96

I understand the comment above from @Ingramz and the intention to increase visibility on invalid comment is good but current approach blocks highlighting of other parts that are valid or detection of other issues further in the document.

Similar to compilers and other grammars, attempting to recover from error and continue parsing translates to better developer experience other than going though all possible issues one by one.

A possible midway approach could be to tokeninze with invalid.illegal.bad-comments-or-CDATA.xml but only until the next --> is found.

@Ingramz
Copy link

Ingramz commented Oct 26, 2020

My understanding of the problem we are trying to solve now is that people get frustrated when they accidentally leave or type in -- which ruins the highlighting for the rest of the file, so it becomes impossible to work on until that segment is eliminated. If this is the case, we should mention this explicitly before committing to a change. Otherwise we are just blindly catering to someones opinion, which is frustrating if someone else comes with a different opinion later on. Please do correct me if I am mistaken.

Now that we have something to solve, I think we should go with the midway approach such that erroneous parts shall not be attempted to highlight correctly at all as proposed by @vfcp. Terminating the comment at the next valid --> seems like a reasonable tradeoff to me. Also consider cases containing --->, in which case I think it is important that we do not stop the comment there as it is cannot be part of a valid comment either.

If someone else has any thoughts on this, feel free to provide your take on this.

@vfcp
Copy link

vfcp commented Oct 30, 2020

So, for the midway approach, I'm proposing adding an end regexp like this (^|[^-])(?=-->)

Final rule would be:

      {
        'begin': '<!--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
        'end': '-->'
        'name': 'comment.block.xml'
        'patterns': [
          {
            'begin': '--(?!>)'
            'end': '(^|[^-])(?=-->)'
            'captures':
              '0':
                'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }
        ]
      }

I'll test this in linguist, vscode's textmate and atom's firstmate before updating my PR.

@Ingramz
Copy link

Ingramz commented Nov 1, 2020

@vfcp in general it looks good. The captures section in the subpattern probably doesn't do what you want, move the name to the same level as begin / end instead. You'll probably notice it when you test it yourself.

While investigating this, I came up with a slightly different approach to capture the comments:

      {
        'begin': '<!--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
        'end': '(?<!-)-->'
        'name': 'comment.block.xml'
        'patterns': [
          {
            'begin': '--'
            'end': '(?=-->)'
            'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }
        ]
      }

It should be doing the same thing, but it feels a little cleaner to me. Let me know what you think.

vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020
Updated grammar as per latest solution discussed in atom#91
Added test to document the new behavior when invalid comments are found
vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020
Minor test correction
vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020
Minor test correction
@vfcp
Copy link

vfcp commented Nov 2, 2020

@Ingramz Your solution works much better with Linguist highlighter and it's also much easier to understand and maintain.

The ---> special case is handled on the correct level and the invalid comment begin/end rules reflect exactly the behavior we are trying to achieve.

I've updated the PR #101 code and tests accordingly.

Thanks a lot for your feedback!

vfcp added a commit to vfcp/language-xml that referenced this issue Nov 5, 2020
Update invalid comment end regexp to reduce generated tokens

Added test case for empty XML comment `<!---->`
vfcp added a commit to vfcp/language-xml that referenced this issue Nov 8, 2020
Updated invalid comment begin rule to reduce number of tokens generated
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants