-
-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageutils sectionID
function removing dots and colons: why?
#2580
Comments
no dots in headline IDs had implemented around 2009, likely to avoid conflict with css class selector (.) |
Also see the discussion in the old bug tracker on that: https://bugs.dokuwiki.org/1627.html |
Thanks. I see. I already thought that might be what's behind it: caution in regards to JavaScript/jQuery and CSS selector syntax. However, in this case the caution seems to be based on a misreading of the W3C spec for CSS2. (Or rather a miswriting: The spec formulation is really badly phrased at this point.) In the old bug tracker (https://bugs.dokuwiki.org/1627.html), HåkanS quotes the W3C spec:
(highlighting of parts in bold by me, as well as marking a self-contained expression with { ... } to avoid a reference error when parsing) So the first sentence says "only", but then immediately the next sentence continues with "also", which is really confusing. And then the the qualification "as a numeric code" applies only to "any ISO 10646 character", as can be concluded from the given examples there. I found this website does a good job at phrasing this all more clearly: https://mathiasbynens.be/notes/css-escapes
These are the relevant diagrams ("railroad diagrams"): The latter diagram makes it clear that any non-newline and non-hex (i.e., not 0-9 and A-F [case insensitive]) character can be inserted in an escaped way, by simply preceding it with a backslash. (Even standard "allowed" characters (except for newline and hex characters), which would not need escaping, can be inserted in this way, e.g. '\g' in a CSS identifier is equivalent to 'g'. [Note that '\n' for example is not escaped to a newline in CSS, but simply to 'n'.]) The standard JavaScript DOM methods, like With all that said: This only tells us what's allowed in CSS, and how it must be escaped in CSS (and JavaScript dealing with CSS selectors). When simply generating HTML and filling the "id" and "class" attributes, one does not have to worry about escaping. So
The CSS standard would allow any character in an escaped form inside identifiers. But HTML or XHTML have more restrictions on what can be used in "id" and "class" attributes: From the XHTML 1.0 standard under the heading "C.8. Fragment Identifiers":
(emphasis mine; I think the the So using dots and colons inside |
A verbose explanation why including colons and periods in section IDs should be allowable and does not cause problems can be found in this 3 year old issue: dokuwiki#2580 Running a DokuWiki with this small modification since 3 years, allowing '.' and ':' in section ids and anchor links to them etc. has caused no problems in all this time.
In
inc/pageutils.php
, line 231, there is a function namedsectionID
:This function is apparently used to transform fragment identifiers in URLs and id tags of section headers to some "allowed" format, by removing for example dots and colons, among some other things.
It is almost exclusively called from within the function
_headerToLink
(defined equally ininc/parser/xhtml.php
andinc/parser/metadata.php
), which, despite its name is also used for rendering thehref
in link tags from Wiki code like this:Anchorlink: [[#mv.01.01]]
which becomes (simplified)
Anchorlink: <a href="#mv0101">mv.01.01</a>
(i.e. the dots are removed from the
href
)(called, for example, from within the functions
locallink
andinternallink
ininc/parser/xhtml.php
)So it is not possible to define anchor ids containing dots, for example, and link to them.
I wonder what the rationale behind this behaviour is, if there is one, and whether it should in fact only pertain to the anchor ids of TOC-relevant section headers, which, for some reason, should not be allowed to contain for example dots and colons. (I understand that colons are used for the namespace hierarchy when not using URL rewrites, so there is an argument for disallowing colons in section ids for better legibility, but since the fragment is separated by a hash '#', this is not a technical necessity.)
Dots and colons are not illegal characters for fragment links and id tags in the XHTML 1.0 specification (see section "C.8. Fragment Identifiers" there). So I see no reason why they are stripped here.
In a DokuWiki installation that I help to administrate, we need dots in some fragment URLs (including some section headers). So I changed the
inc/pageutils.php
, line 231from
to
without experiencing any problems after that, but being able to use dots in DokuWiki hash links and section ids now.
If there is no actual technical reason why dots (and maybe also colons) should be disallowed, I propose to incorporate that change in the DokuWiki official source.
The text was updated successfully, but these errors were encountered: