Skip to content
This repository was archived by the owner on Dec 15, 2022. It is now read-only.

Comments

Improve the support for attribute selectors#90

Merged
winstliu merged 6 commits intoatom:masterfrom
hediyi:improve-attr-selector
Oct 11, 2016
Merged

Improve the support for attribute selectors#90
winstliu merged 6 commits intoatom:masterfrom
hediyi:improve-attr-selector

Conversation

@hediyi
Copy link
Contributor

@hediyi hediyi commented Sep 29, 2016

No description provided.

https://www.w3.org/TR/xml/#NT-NameStartChar

Apart from being used to style HTML pages and user interfaces, CSS can
also be applied to any XML document. HTML is not a derivative of XML,
but all attributes in HTML are XML-compatible. So by imposing the Name
syntax of XML, we get both sides covered.

The original part of the pattern meaning to match attribute names tried
to impose the syntax of CSS identifiers, which is not relevant here.
https://www.w3.org/TR/selectors4/#attribute-representation

Attribute values must be CSS identifiers or strings.

The reasons to replace the POSIX character classes ([:^ascii:]):
* Using Unicode character properties is cleaner.
* There is a good chance that this feature will be removed from
  Oniguruma 7.0.
@hediyi
Copy link
Contributor Author

hediyi commented Sep 30, 2016

ready to 🎸

@50Wliu Do you think the scope name "string.unquoted.attribute-value.css" is appropriate?

@winstliu winstliu self-assigned this Oct 6, 2016
@winstliu
Copy link
Contributor

winstliu commented Oct 6, 2016

Assigning myself so I don't forget about this.

'match': '''(?x)
(\\[)\\s*
(?:(\\g<ident>|\\*)?(\\|))? # qualified name prefix
([\\w:&&[^0-9]][\\w:.-]*+)\\s* # attribute name
Copy link
Contributor

@winstliu winstliu Oct 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these lines are correct. I don't see anywhere in the spec (which you also seem to have used) that if the value is exactly the ident then it should be treated specially.

Or maybe I'm misinterpreting this line and the meaning of \g?

(\\[)\\s*
(?:(\\g<ident>|\\*)?(\\|))? # qualified name prefix
([\\w:&&[^0-9]][\\w:.-]*+)\\s* # attribute name
(?:
Copy link
Contributor

@winstliu winstliu Oct 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the above comment...since ident shouldn't be special, the quotes shouldn't be required here either.

([~|^$*]?=)\\s*
(?: # attribute value
(\\g<ident>)
| ((?>([\'"])(?:[^\\\\]|\\\\.)*?(\\k<-2>)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, this isn't actually a matched line, correct? You're just defining the ident capture group here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, can you explain in detail what the \\\\\\P{ASCII} part is for?

Copy link
Contributor Author

@hediyi hediyi Oct 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, here it defines a group named "ident" which is only meant to be referred to with (\g<ident>).

\\\\\\P{ASCII} is actually two things:

  • \\\\ is to match ""
  • \\P{ASCII} (upper case for negation, as with \B) is equivalent to [:^ascii:], which is exactly ISO 10646 characters U+00A0 and higher

These two features (\p{…} and [:…:]) kind of overlap. [:…:], basically the less flexible version of \p{…}, will be removed from Oniguruma in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I knew what the \\P{ASCII} part was doing, I just got confused by the abundance of backslashes 😆. So an ident can contain backslashes?

Copy link
Contributor Author

@hediyi hediyi Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

screen shot 2016-10-09 at 7 43 52 am

<ident-token> can contain escape sequences, which begin with a backslash. So technically, it can contain any characters (as long as you escape it) except U+0000. I don't want to include escape in the pattern of identifier coz I think it's not worth it, since authors rarely want to use escape in their ids, class names and stuff.

It's in the original pattern and I thought it's nice to have so I didn't remove it. Now that I think about it, I should probably remove it, since it's no good in a pattern that doesn't match whitespace.

If we were to support escape, the pattern could be something like:

-?(?:[_a-z]|(?![\u0000-\u0239]).*|\\[0-9a-f]{1,6}(\r\n|[ \t\r\n\f])?|\\[^\r\n\f0-9a-f])(?:[_a-z0-9-]|(?![\u0000-\u023‌​9]).*|\\[0-9a-f]{1,6‌​}(\r\n|[ \t\r\n\f])?|\\[^\r\n\f0-9a-f])*

Or we can simply change the pattern of identifier to .+, to let authors remind themselves what not to write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do .+. I try to be as close to the spec as possible but in this case I don't think it's worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some tweaking, I found .+ didn't work. And even if it works, the pattern would suffer from backtracks that could have been avoided.

I've removed \\\\ from the pattern. The pattern should work in most cases so I think it's good enough for now.

'1':
'name': 'punctuation.definition.entity.css'
'2':
'name': 'entity.other.namespace-prefix.css'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you dropped the extra attribute from the end of this. Was that for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, with language-html.

https://www.w3.org/TR/selectors4/#attrnmsp
https://drafts.csswg.org/css-namespaces-3/#css-qnames

The attribute name in an attribute selector is given as a CSS qualified
name: a previously declared namespace prefix or an "asterisk" (*) MAY be
prepended to the attribute name separated by the namespace separator
"vertical bar" (|).

https://drafts.csswg.org/css-namespaces-3/#syntax

A namespace prefix must be a valid CSS identifier.
@winstliu winstliu merged commit f018166 into atom:master Oct 11, 2016
@hediyi hediyi deleted the improve-attr-selector branch October 13, 2016 02:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants