Improve the support for attribute selectors#90
Improve the support for attribute selectors#90winstliu merged 6 commits intoatom:masterfrom hediyi:improve-attr-selector
Conversation
https://www.w3.org/TR/xml/#NT-NameStartChar Apart from being used to style HTML pages and user interfaces, CSS can also be applied to any XML document. HTML is not a derivative of XML, but all attributes in HTML are XML-compatible. So by imposing the Name syntax of XML, we get both sides covered. The original part of the pattern meaning to match attribute names tried to impose the syntax of CSS identifiers, which is not relevant here.
https://www.w3.org/TR/selectors4/#attribute-representation Attribute values must be CSS identifiers or strings. The reasons to replace the POSIX character classes ([:^ascii:]): * Using Unicode character properties is cleaner. * There is a good chance that this feature will be removed from Oniguruma 7.0.
|
ready to 🎸 @50Wliu Do you think the scope name "string.unquoted.attribute-value.css" is appropriate? |
|
Assigning myself so I don't forget about this. |
| 'match': '''(?x) | ||
| (\\[)\\s* | ||
| (?:(\\g<ident>|\\*)?(\\|))? # qualified name prefix | ||
| ([\\w:&&[^0-9]][\\w:.-]*+)\\s* # attribute name |
There was a problem hiding this comment.
I don't think these lines are correct. I don't see anywhere in the spec (which you also seem to have used) that if the value is exactly the ident then it should be treated specially.
Or maybe I'm misinterpreting this line and the meaning of \g?
| (\\[)\\s* | ||
| (?:(\\g<ident>|\\*)?(\\|))? # qualified name prefix | ||
| ([\\w:&&[^0-9]][\\w:.-]*+)\\s* # attribute name | ||
| (?: |
There was a problem hiding this comment.
See the above comment...since ident shouldn't be special, the quotes shouldn't be required here either.
| ([~|^$*]?=)\\s* | ||
| (?: # attribute value | ||
| (\\g<ident>) | ||
| | ((?>([\'"])(?:[^\\\\]|\\\\.)*?(\\k<-2>))) |
There was a problem hiding this comment.
Just to confirm, this isn't actually a matched line, correct? You're just defining the ident capture group here?
There was a problem hiding this comment.
In addition, can you explain in detail what the \\\\\\P{ASCII} part is for?
There was a problem hiding this comment.
Yes, exactly, here it defines a group named "ident" which is only meant to be referred to with (\g<ident>).
\\\\\\P{ASCII} is actually two things:
\\\\is to match ""\\P{ASCII}(upper case for negation, as with\B) is equivalent to[:^ascii:], which is exactlyISO 10646 characters U+00A0 and higher
These two features (\p{…} and [:…:]) kind of overlap. [:…:], basically the less flexible version of \p{…}, will be removed from Oniguruma in the future.
There was a problem hiding this comment.
Yeah, I knew what the \\P{ASCII} part was doing, I just got confused by the abundance of backslashes 😆. So an ident can contain backslashes?
There was a problem hiding this comment.
<ident-token> can contain escape sequences, which begin with a backslash. So technically, it can contain any characters (as long as you escape it) except U+0000. I don't want to include escape in the pattern of identifier coz I think it's not worth it, since authors rarely want to use escape in their ids, class names and stuff.
It's in the original pattern and I thought it's nice to have so I didn't remove it. Now that I think about it, I should probably remove it, since it's no good in a pattern that doesn't match whitespace.
If we were to support escape, the pattern could be something like:
-?(?:[_a-z]|(?![\u0000-\u0239]).*|\\[0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|\\[^\r\n\f0-9a-f])(?:[_a-z0-9-]|(?![\u0000-\u0239]).*|\\[0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|\\[^\r\n\f0-9a-f])*
Or we can simply change the pattern of identifier to .+, to let authors remind themselves what not to write.
There was a problem hiding this comment.
Let's just do .+. I try to be as close to the spec as possible but in this case I don't think it's worth it.
There was a problem hiding this comment.
After some tweaking, I found .+ didn't work. And even if it works, the pattern would suffer from backtracks that could have been avoided.
I've removed \\\\ from the pattern. The pattern should work in most cases so I think it's good enough for now.
| '1': | ||
| 'name': 'punctuation.definition.entity.css' | ||
| '2': | ||
| 'name': 'entity.other.namespace-prefix.css' |
There was a problem hiding this comment.
I noticed you dropped the extra attribute from the end of this. Was that for consistency?
There was a problem hiding this comment.
Yes, with language-html.
https://www.w3.org/TR/selectors4/#attrnmsp https://drafts.csswg.org/css-namespaces-3/#css-qnames The attribute name in an attribute selector is given as a CSS qualified name: a previously declared namespace prefix or an "asterisk" (*) MAY be prepended to the attribute name separated by the namespace separator "vertical bar" (|). https://drafts.csswg.org/css-namespaces-3/#syntax A namespace prefix must be a valid CSS identifier.

No description provided.