Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

XML syntax definition problem #96

Closed
alexr00 opened this issue Jan 21, 2019 · 7 comments
Closed

XML syntax definition problem #96

alexr00 opened this issue Jan 21, 2019 · 7 comments

Comments

@alexr00
Copy link

alexr00 commented Jan 21, 2019

From @omochi on January 19, 2019 17:0

I may found bug of syntax.

https://github.com/Microsoft/vscode/blob/10a1d2a50a2882f5ae85bdb51eb04d3064fb9de9/extensions/xml/syntaxes/xml.tmLanguage.json#L354-L363

It should be:

 { 
 	"begin": "<%--", 
 	"captures": { 
 		"0": { 
 			"name": "punctuation.definition.comment.xml" 
 		}
	}, 
 	"end": "--%>", 
 	"name": "comment.block.xml"  
 }, 

Copied from original issue: microsoft/vscode#66776

@alexr00
Copy link
Author

alexr00 commented Jan 21, 2019

From @omochi on January 19, 2019 17:24

Oh, I found more.

https://github.com/Microsoft/vscode/blob/10a1d2a50a2882f5ae85bdb51eb04d3064fb9de9/extensions/xml/syntaxes/xml.tmLanguage.json#L375-L380

should be:

"match": "--(?!>)",
"captures": {
    "0": {
        "name": "invalid.illegal.bad-comments-or-CDATA.xml"
    }
}

@rsese
Copy link

rsese commented Jan 25, 2019

Thanks @alexr00 @omochi - can either of you share a specific code snippet that demonstrates the problem?

@omochi
Copy link

omochi commented Jan 25, 2019

I recorded video.

https://www.youtube.com/watch?v=UeCwolx7oW0&feature=youtu.be

First issue:
begin and end are not paired.
So punctuation.definition.comment.xml scope does not constructed.

Second issue:
invalid.illegal.bad-comments-or-CDATA.xml is single token and not scope.
So match rule is better than begin.

VSCode process begin without end as match.
So no user level issue is. It is just for technically quality.
Because textmate syntax file is kinda interchangeable among some text editors.

@rsese
Copy link

rsese commented Jan 25, 2019

Thanks @omochi -

First issue:
begin and end are not paired.
So punctuation.definition.comment.xml scope does not constructed.

It looks like your example is:

<%-- --%>

I think those are JSP comments, not XML comments? In Atom, if I set the language to JSP the closing --%> is recognized correctly:

jsp-comment

x-ref: #87 and 0a04e02

Let me know if I'm misunderstanding something here.

Second issue:
invalid.illegal.bad-comments-or-CDATA.xml is single token and not scope.
So match rule is better than begin.

Your example here is:

<!-- -- -->

Related to the PR above, I believe this is already reported here: #91

@omochi
Copy link

omochi commented Jan 26, 2019

@rsese I just reports about syntax of grammar file.

report 1

See this current code.

      {
        'begin': '<%--'
        'captures':
          '0':
            'name': 'punctuation.definition.comment.xml'
          'end': '--%>'
          'name': 'comment.block.xml'
      }

{
'begin': '<%--'
'captures':
'0':
'name': 'punctuation.definition.comment.xml'
'end': '--%>'
'name': 'comment.block.xml'
}

end parameter is in object for captures key.
It is wrong.
end should be in same object with begin key.

So it should be this.

 { 
    "begin": "<%--", 
    "captures": { 
        "0": { 
            "name": "punctuation.definition.comment.xml" 
        }
    }, 
    "end": "--%>", 
    "name": "comment.block.xml"  
 }, 

Because JSP mode works correctly, JSP syntax is written as so.

  'comment':
    'begin': '<%--'
    'captures':
      '0':
        'name': 'punctuation.definition.comment.jsp'
    'end': '--%>'
    'name': 'comment.block.jsp'

https://github.com/atom/language-java/blob/3f78faa86e8f84786e69715e1911a028fe99035c/grammars/java%20server%20pages%20(jsp).cson#L55-L61

report 2

current definition is.

          {
            'begin': '--(?!>)'
            'captures':
              '0':
                'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }

{
'begin': '--(?!>)'
'captures':
'0':
'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
}

I think that to replace begin to match is better.
So it should be this.

          {
            'match': '--(?!>)'
            'captures':
              '0':
                'name': 'invalid.illegal.bad-comments-or-CDATA.xml'
          }

Because this block is intended to match -- inside comment.
It is single token match and not scope creation.
If it works in atom and VSCode, it is by their flexible fallback that convert begin to match if it has no corresponded end.
Changing match is good for compatibility of this syntax file among other text editors.

@omochi
Copy link

omochi commented Jan 26, 2019

More information about report (2).

I reported same concept for PHP syntax.

microsoft/vscode#66809

They accept it and make PR to patch this.

atom/language-php#352

This PR is also reviewed by @rsese . I say same things.

@no-response
Copy link

no-response bot commented Mar 4, 2019

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

@no-response no-response bot closed this as completed Mar 4, 2019
vfcp added a commit to vfcp/language-xml that referenced this issue Oct 26, 2020
atom#96 already provides all the details about the issues fixed here.

atom#87 (comment) has the correct code but merge included some extra indent which causes the rule not to work properly.

In relation with atom#91, a `begin` without `end` or `while` was added but this is not valid as `begin` should always have a corresponding `end` or `while`.  `match` should be used instead of `begin`
vfcp added a commit to vfcp/language-xml that referenced this issue Oct 26, 2020
Updated xml-spec.coffee for proposed solution
vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020
Updated grammar as per latest solution discussed in atom#91
Added test to document the new behavior when invalid comments are found
vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020
Minor test correction
vfcp added a commit to vfcp/language-xml that referenced this issue Nov 2, 2020
Minor test correction
vfcp added a commit to vfcp/language-xml that referenced this issue Nov 5, 2020
Update invalid comment end regexp to reduce generated tokens

Added test case for empty XML comment `<!---->`
vfcp added a commit to vfcp/language-xml that referenced this issue Nov 8, 2020
Updated invalid comment begin rule to reduce number of tokens generated
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants