Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XSD definition improvements #6389

Merged
merged 3 commits into from
Jun 21, 2017

Conversation

mikeSimonson
Copy link
Contributor

No description provided.

@@ -200,6 +200,13 @@
<xs:anyAttribute namespace="##other"/>
</xs:complexType>

<xs:simpleType name="TABLENAME" id="TABLENAME">
<xs:restriction base="xs:token">
<xs:pattern value="[-._:A-Za-z0-9\`]" id="TABLENAME.pattern">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way too restrictive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more something like that /[a-zA-Z_\x01-\xff\`]+/ ?
Based on https://dev.mysql.com/doc/refman/5.7/en/identifiers.html

</xs:complexType>

<xs:simpleType name="CLASSPATH" id="CLASSPATH">
<xs:restriction base="xs:token">
<xs:pattern value="([a-zA-Z_$][a-zA-Z\d_$\\]*\.)*[a-zA-Z_$][a-zA-Z\d_$\\]*" id="CLASSPATH.pattern">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex for class names is something like /^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]+$/ - not just ascii

@@ -200,6 +200,13 @@
<xs:anyAttribute namespace="##other"/>
</xs:complexType>

<xs:simpleType name="TABLENAME" id="TABLENAME">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the uppercase naming?

</xs:complexType>

<xs:simpleType name="CLASSPATH" id="CLASSPATH">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the uppercase naming?

@mikeSimonson
Copy link
Contributor Author

@Ocramius updated. Is there an existing place to add tests or should I create it ?

@Ocramius
Copy link
Member

Is there an existing place to add tests or should I create it ?

Good question. Wondering if there's an XSD validation tool anywhere (not an online tool)

@mikeSimonson
Copy link
Contributor Author

@Ocramius I was thinking using something along the line of http://php.net/manual/en/domdocument.schemavalidate.php

@mikeSimonson
Copy link
Contributor Author

and it seems that this mechanism is already used.

@Ocramius
Copy link
Member

Ocramius commented Apr 10, 2017 via email

@@ -412,9 +419,16 @@
<xs:sequence>
<xs:any minOccurs="0" maxOccurs="unbounded" namespace="##other"/>
</xs:sequence>
<xs:attribute name="class" type="xs:string" use="required" />
<xs:attribute name="class" type="orm:classpath" use="required" />
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fqcn is probably better here no ?

@lcobucci
Copy link
Member

If you want to create some valid/invalid documents, you can give it a shot.
I was more wondering about the XSD itself :-)

@Ocramius AFAIK we would have issues when trying to validate a document with an invalid XSD, so I'd be fine once the test suite is passing

Copy link
Member

@Ocramius Ocramius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@Ocramius Ocramius changed the title Xml dtd improvements XSD definition improvements Jun 21, 2017
@Ocramius Ocramius added this to the 2.6.0 milestone Jun 21, 2017
@Ocramius Ocramius self-assigned this Jun 21, 2017
@Ocramius Ocramius merged commit 33cee11 into doctrine:master Jun 21, 2017
@Ocramius
Copy link
Member

@mikeSimonson this started failing in master, but I really have no clue why: all XML files in that specific test seem to be fine :-\

@mikeSimonson
Copy link
Contributor Author

mikeSimonson commented Jun 21, 2017 via email

@Ocramius
Copy link
Member

Ocramius commented Jun 21, 2017 via email

@guilliamxavier
Copy link
Contributor

(Sorry if this seems not very constructive, but if it may help a little:)

Is "[a-zA-Z_u01-uff.]+" (and "[a-zA-Z_u01-uff][a-zA-Z0-9_u01-uff]+") really the intended pattern?

  • About the u01-uff part, isn't it missing backslashes like \u01-\uff, or probably more like \u0001-\u00ff (or \x01-\xff)? (As it is, I interpret it as "u or 0 or a char in the range 1-u (which feels like a pretty weird range) or f or f (bis)"...)

@lcobucci
Copy link
Member

@guilliamxavier @mikeSimonson mistery solved on #6515, ancient libxml versions (but now we're facing some other issues with MySQL/MariaDB on Trusty env 😜 )

@guilliamxavier
Copy link
Contributor

@lcobucci After looking deeper into the test failures, I have indeed realized that they were actually not related to pattern validation...
But, independently of the tests, I still wonder: is "[u01-uff]" actually a desired character class?

@lcobucci
Copy link
Member

@guilliamxavier tbh I'm not sure, I'll leave that question to @mikeSimonson

@mikeSimonson
Copy link
Contributor Author

@guilliamxavier The charachter class is based on what mysql accept as identifier (https://dev.mysql.com/doc/refman/5.7/en/identifiers.html). If you have a better proposition that also take into account other dbms fire away or better pull request away.

@mikeSimonson mikeSimonson deleted the xml-dtd-improvements branch June 28, 2017 20:55
@guilliamxavier
Copy link
Contributor

guilliamxavier commented Jun 29, 2017

@mikeSimonson Thanks for the feedback. My first and main concern is actually about the syntax of the pattern:

[a-zA-Z_u01-uff.]+

which, as shown below, doesn't accept e.g. "`my-foo`".

TL;DR: u01-uff is not "the range of Unicode characters 01 to ff" (and there are additional issues). Explanations (and possible correction) follow, with test scripts at bottom.

1. XML pattern syntax for Unicode characters

As I said in my first (i.e. second-to-last) comment, the bare (non-escaped) u01 and uff just look wrong to me, and some testing (script further below) actually confirmed that the above pattern means this:

([a-z]|[A-Z]|_|u|0|[1-u]|f|f|\.)+

(where, notably, f is duplicated and [1-u] is an unusual ASCII range corresponding to the following characters: 123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu) which is very probably not what was intended. For example, it doesn't accept the legit quoted table name `my-foo` in the XML line

    <entity name="Foo" table="`my-foo`">

and gives errors:

[ERROR 1839] Element '{http://doctrine-project.org/schemas/orm/doctrine-mapping}entity', attribute 'table': [facet 'pattern'] The value '`my-foo`' is not accepted by the pattern '[a-zA-Z_u01-uff.]+'.
[ERROR 1824] Element '{http://doctrine-project.org/schemas/orm/doctrine-mapping}entity', attribute 'table': '`my-foo`' is not a valid value of the atomic type '{http://doctrine-project.org/schemas/orm/doctrine-mapping}tablename'.

(because - isn't included in the character class) while it accepts the disputable unquoted table name my;foo in

    <entity name="Foo" table="my;foo">

(because ; is).

I guess that the part u01-uff was actually intended to mean "the range of Unicode characters 01 to ff", so I tried different backslash-escaped forms, e.g. for u01 for U+0001 I tested: \x01, \x{1}, \u0001, \u{1} (and similarly for uff for U+00FF), both lower- and upper-case, with and without braces, varying the amount of leading zeros... but each time got the following kind of errors:

Warning: DOMDocument::schemaValidate(): Invalid Schema in .../test.php on line 10

[FATAL 1450] failed to compile: Wrong escape sequence, misuse of character '\'
[FATAL 1450] failed to compile: xmlFAParseCharClass: ']' expected
[FATAL 1450] failed to compile: xmlFAParseRegExp: extra characters
[ERROR 1756] Element '{http://www.w3.org/2001/XMLSchema}pattern': The value '[a-zA-Z_\x01-\xff.]+' of the facet 'pattern' is not a valid regular expression.

Note that those errors (and PHP warning) occur when libxml is parsing the XSD Schema, before even trying to validate any XML document.

The correct syntax in "XML Regex" is actually to use XML entities e.g. &#x01; (or &#x0001; or &#x1;). But now I get:

Warning: DOMDocument::schemaValidate(): Invalid Schema in .../test.php on line 10

[FATAL 9] xmlParseCharRef: invalid xmlChar value 1
[ERROR 3067] Failed to parse the XML resource '.../doctrine-mapping.xsd'.

because U+0001 is not a valid character in XML 1.0 (note: but it is in XML 1.1). It seems that the only valid chars before U+0020 (space) are U+0009 (\t), U+000A (\n) and U+000D (\r). So if I change the pattern to

[a-zA-Z_&#x09;-&#xff;.]+

then it "works", i.e. no more XSD errors, and the quoted table name `my-foo` is accepted (which was the purpose of your commit f537eb2 after all).

2. Actual meaning

But you must realize that &#x09;-&#xff; already includes all the chars in a-zA-Z_., so the previous pattern actually boils down to

[&#x09;-&#xff;]+

which one might find unexpected.

3. [My]SQL identifier

Now, according to your link, the correct pattern for a MySQL identifier would be more like

[0-9a-zA-Z$_&#x0080;-&#xFFFF;]+|`[&#x0001;-&#x007F;&#x0080;-&#xFFFF;]+`

(NB: FFFF, not 00FF which was meant by ff (ignoring the case)). But again, XML (1.0) doesn't like 0001, and also not FFFF neither FFFE; plus we can merge the ranges ...-007F and 0080-...; so:

[0-9a-zA-Z$_&#x0080;-&#xFFFD;]+|`[&#x0009;-&#xFFFD;]+`

which "works". But it is still not complete (emphasis mine):

  • Identifiers may begin with a digit but unless quoted may not consist solely of digits.
  • Database, table, and column names cannot end with space characters.
  • Identifier quote characters can be included within an identifier if you quote the identifier. (...) then you need to double the character. The following statement creates a table named a`b (...): CREATE TABLE `a``b` (...);

(and not accounting that with ANSI_QUOTES you can also use " as the quote character).

And as you said, there are other DBMS, with possibly different accepted chars and restrictions...

Moreover, you seemed to want to accept the dot character, e.g. table="myschema.mytable"?

(At this point, maybe one could as well consider going with just type="xs:string" (as was done in commit fafb816)?)

4. FQCN

The same concerns apply to your second commit b433257, and the correct character class for a FQCN will of course be different than for a MySQL identifier (e.g. it must accept \, and should not accept `, - or ;...)


Finally here is my test.php script (inspired from https://github.com/doctrine/doctrine2/blob/388afb46d0cb3ed0c51332e8df0de9e942c2690b/tests/Doctrine/Tests/ORM/Mapping/XmlMappingDriverTest.php#L151 + PHP manual): https://pastebin.com/YEreZ8xC

And some minimalist XML test file examples (change xxx to the values to test):


What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants