EPUBCHECK V4.0 through pagina giving incorrect error messages #643

jeankaplansky · 2015-10-26T19:37:33Z

I was working with a publisher file today where the IDs in the NCX were extremely long, e.g.,:

f00dce77-86e2-4bca-bdd9-d73160d0588c

Upon running EPUBCHECK 4.0 (pagina's packaging), I got a log file complaining about the use of ":" in the IDs. None of the IDs actually contained a ":" character, however.

I established that changing the IDs to shorter strings like "AAAA" was all that was required to bring the file back to a point of validation with Pagina.

However, running the same file through EPUBCHECK 3.0.1 via oXygen XML Editor did not throw the errors and the file was considered valid.

Here are the specific errors I'm getting with EPUBCHECK V4.0:

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 21, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 27, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 33, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 46, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 52, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 64, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 70, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 76, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 82, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 88, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 100, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 107, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 120, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

So the first spurious error is complaining about the presence of ":". The IDs in question, while long, are also perfectly valid if we are operating on the XML IDs "must start with a letter, underscore, or colon, and be composed of letters and numbers, underscores, periods, and hyphens." per https://books.google.com/books?id=xYEyK_7TWP8C&pg=PA531&lpg=PA531&dq=XML+IDs+must+begin+with+a+letter+a+number+or+an+underscore&source=bl&ots=Ps2AMfv_pG&sig=5pij0YU55T7olOvkFqVSjI_fwxY&hl=en&sa=X&ved=0CCQQ6AEwAWoVChMI9obj2fDgyAIVCRo-Ch3VJAvB#v=onepage&q=XML%20IDs%20must%20begin%20with%20a%20letter%20a%20number%20or%20an%20underscore&f=false

The same IDs do not pop errors in EPUB 3.x executed through Oxygen XML Editor.

We need to establish whether this a Pagina specific error, or something specific to EPUBCHECK 4.X.

Please let me know any questions or comments.

Thanks,
Jean Kaplansky
jkaplansky@safaribooksonline.com

mattgarrish · 2015-10-26T19:51:12Z

If you're generating GUIDs for IDs you need to make sure that they don't start with a number.

The error message is not singling out that a colon is in error, but that the value has to be an xml name without any colons in it to be valid. You're breaking the valid xml name part if you don't have any colons.

It sounds like this message could use some tweaking as Tzviya also ran into it in #533 .

Perhaps break the statements in two along the lines of "must be a valid XML name, and must not contain any colons"

rdeltour · 2015-10-26T20:06:13Z

Right, "XML name without colons" comes from the value of the id attribute being required to match the NCName production as required in the Namespaces in XML 1.0 spec.

This error message comes from Jing (the 3d party RelaxNG + Schematron validation engine used by EpubCheck) and as far as I remember we cannot easily override it to make it more user friendly.

See also #193, #224, #307, #533.

I'm closing as wontfix, since we do not have control over the message.

jeankaplansky · 2015-10-26T20:09:09Z

Sorry. I did do a cursory search before filing the issue. Too bad about not being able to come up with something more obvious. Should I tell people to leave their really long IDs alone in the future, or is it just better for downstream practices if everything truly passes EPUBCHECK?

Thanks,
Jean

rdeltour · 2015-10-26T20:21:01Z

Too bad about not being able to come up with something more obvious.

Yes. I'll have another look to see if this can be somehow improved.

Should I tell people to leave their really long IDs alone in the future, or is it just better for downstream practices if everything truly passes EPUBCHECK?

I'd say it's better if it passes validation as early as possible 😄 . Long IDs are fine, as long as they match the type restrictions. As Matt said, the culprit is generally a numeral used as the first character.
Also, note that latest oXygen version 17.1 comes with EpubCheck 4.0.

jeankaplansky · 2015-10-26T20:49:25Z

Thanks for letting me know. I'll update my Oxygen.

mattgarrish · 2015-10-26T21:46:37Z

Oh, right, I'll have to try to file that away in longer-term memory. I thought you were outputing the message and that you closed the last one because you couldn't specify the exact location of the problem in the value...

tofi86 · 2015-10-26T22:33:02Z

This error message comes from Jing (the 3d party RelaxNG + Schematron validation engine used by EpubCheck) and as far as I remember we cannot easily override it to make it more user friendly.

@rdeltour I haven't tested yet, but now that the Jing properties files are available in this repo ( https://github.com/IDPF/epubcheck/blob/master/src/main/resources/com/thaiopensource/datatype/xsd/resources/Messages.properties#L53 ) it should be easy to tweak the message...

rdeltour added type: duplicate The issue duplicates an existing issue status: wontfix The issue is rejected due to limitations (of scope or dev resources) labels Oct 26, 2015

rdeltour closed this as completed Oct 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPUBCHECK V4.0 through pagina giving incorrect error messages #643

EPUBCHECK V4.0 through pagina giving incorrect error messages #643

jeankaplansky commented Oct 26, 2015

mattgarrish commented Oct 26, 2015

rdeltour commented Oct 26, 2015

jeankaplansky commented Oct 26, 2015

rdeltour commented Oct 26, 2015

jeankaplansky commented Oct 26, 2015

mattgarrish commented Oct 26, 2015

tofi86 commented Oct 26, 2015

EPUBCHECK V4.0 through pagina giving incorrect error messages #643

EPUBCHECK V4.0 through pagina giving incorrect error messages #643

Comments

jeankaplansky commented Oct 26, 2015

mattgarrish commented Oct 26, 2015

rdeltour commented Oct 26, 2015

jeankaplansky commented Oct 26, 2015

rdeltour commented Oct 26, 2015

jeankaplansky commented Oct 26, 2015

mattgarrish commented Oct 26, 2015

tofi86 commented Oct 26, 2015