Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

definition of NCName too strict #540

Closed
jensopetersen opened this issue Mar 23, 2015 · 4 comments
Closed

definition of NCName too strict #540

jensopetersen opened this issue Mar 23, 2015 · 4 comments
Assignees

Comments

@jensopetersen
Copy link
Contributor

jensopetersen commented Mar 23, 2015

I would normally use ASCII characters only for xml:id's and suchlike, but for a specific project it makes sense to use Chinese characters as xml:id's. Here eXist appears to me to be too strict. While it accepts characters in the Basic Multilingual Plane (the second one below), it refuses later additions to Unicode.

<graphs>
    <graph cp="3400" xml:id="&#x3400;"/>
    <graph cp="4E00" xml:id="&#x4e00;"/>
    <graph cp="20000" xml:id="&#x20000;"/>
    <graph cp="f0000" xml:id="&#xf0000;"/>
    <graph cp="100000" xml:id="&#x100000;"/>
</graphs>
@wolfgangmm wolfgangmm added this to the eXist-3.0 milestone Jun 3, 2015
@wolfgangmm wolfgangmm self-assigned this Jun 3, 2015
@jensopetersen
Copy link
Contributor Author

jensopetersen commented Jul 13, 2016

<x xml:id="1"/>

is well-formed, but Xerces holds on to the original XML 1.0 definition of NCName which was quite quickly superseded in XML 1.0 by the definition made in XML 1.1. Since eXist uses Xerces, this means that xml:id’s beginning with digits are held to be malformed in eXist, though they are well-formed according to XML 1.0.

@adamretter
Copy link
Member

@jensopetersen I wonder if the Xerces property http://xml.org/sax/properties/document-xml-version could help, or perhaps even the feature http://xml.org/sax/features/xml-1.1?

See:
https://xerces.apache.org/xerces2-j/properties.html
https://xerces.apache.org/xerces2-j/features.html

@tuurma
Copy link
Contributor

tuurma commented Jan 6, 2017

In the same vein see discussion at relaxng/jing-trang#188

I have encountered the issue when wanting to use some polytonic Greek characters

ͷ 0377 GREEK SMALL LETTER PAMPHYLIAN DIGAMMA
ϝ 03DD GREEK SMALL LETTER DIGAMMA
Ͷ 0376 GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA

which are perfectly kosher NameChar under XML Spec fifth edition https://www.w3.org/TR/REC-xml/#d0e804, alas not a Letter according to fourth https://www.w3.org/TR/2006/REC-xml-20060816/#NT-Letter

@dizzzz dizzzz modified the milestone: eXist-3.0 Feb 9, 2017
@joewiz joewiz added triage issue needs to be investigated and removed triage issue needs to be investigated labels Sep 17, 2018
@duncdrum
Copy link
Contributor

this seems to have been fixed, using the OP examples in 4.5.0.
Please open a new issue if there are still problems with NCName handling.

@line-o line-o removed the triage issue needs to be investigated label Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants