-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XSD regular expression flavor #255
Comments
Off memory I think I've probably removed that regex in the current Development branch, because it was conflicting with the datatype, anyway: Lines 149 to 161 in 6d71cdf
My view is that Your point is of course still valid with respect to the need of documentation on regex flavour. My hope is to enforce it appropriately via the audit tool. |
I think I made that regex, an experiment to see if it is possible to validate a positivelengthmeasure, I believe the regex validation site mentioned in the IDS docs thought it ok, but they're are probably better ways to do this |
@atomczak I think the shorthand https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/datatypes.html#cces-mce and matches only \p{Nd} (Number of decimal digits - General category properties https://www.unicode.org/reports/tr18/#General_Category_Property). Using the unicode database it is possible to find all characters in this set: https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt |
Great suggestions @gverduci, this indeed confirms @atomczak's suspicion: $ grep ';Nd;' UnicodeData.txt | cut -d\; -f1 | xargs -I{} printf \\U000{} 2> /dev/null
𐒠𐒡𐒢𐒣𐒤𐒥𐒦𐒧𐒨𐒩𐴰𐴱𐴲𐴳𐴴𐴵𐴶𐴷𐴸𐴹𑁦𑁧𑁨𑁩𑁪𑁫𑁬𑁭𑁮𑁯𑃰𑃱𑃲𑃳𑃴𑃵𑃶𑃷𑃸𑃹𑄶𑄷𑄸𑄹𑄺𑄻𑄼𑄽𑄾𑄿𑇐𑇑𑇒𑇓𑇔𑇕𑇖𑇗𑇘𑇙𑋰𑋱𑋲𑋳𑋴𑋵𑋶𑋷𑋸𑋹... (these are just a couple of them, I couldn't quickly figure out how to generically get the hex formatted code points to printable characters) |
Thanks all, I mainly wanted to be sure if I'm not mistaken. And yes, this example is already removed from latest Dev branch.
I see a potential problem with auditing regex -
Thanks! If I read this right, |
IDS/Development/IDS_oma.ids
Line 158 in 6d71cdf
Found it in sample files and this doesn't look like a correct regular expression in the XSD pattern flavor. By default, all XSD patterns look at the whole phrase, so
^...$
are not needed (or even supported).I'm not sure about the shorthand
\d
. I think it is supported by XSD and matches all Unicode digits:0-9¹¾六௰Ⅹ೬Дに...
but it would be good if someone could confirm.The text was updated successfully, but these errors were encountered: