Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPUBCHECK V4.0 through pagina giving incorrect error messages #643

Closed
jeankaplansky opened this issue Oct 26, 2015 · 7 comments
Closed

EPUBCHECK V4.0 through pagina giving incorrect error messages #643

jeankaplansky opened this issue Oct 26, 2015 · 7 comments
Labels
status: wontfix The issue is rejected due to limitations (of scope or dev resources) type: duplicate The issue duplicates an existing issue

Comments

@jeankaplansky
Copy link

I was working with a publisher file today where the IDs in the NCX were extremely long, e.g.,:

f00dce77-86e2-4bca-bdd9-d73160d0588c

Upon running EPUBCHECK 4.0 (pagina's packaging), I got a log file complaining about the use of ":" in the IDs. None of the IDs actually contained a ":" character, however.

I established that changing the IDs to shorter strings like "AAAA" was all that was required to bring the file back to a point of validation with Pagina.

However, running the same file through EPUBCHECK 3.0.1 via oXygen XML Editor did not throw the errors and the file was considered valid.

Here are the specific errors I'm getting with EPUBCHECK V4.0:

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 21, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 27, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 33, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 46, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 52, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 64, col 71):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 70, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 76, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 82, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 88, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 100, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 107, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

ERROR (RSC-005) at "the-monsters-of-education-technology.epub/toc.ncx" (line 120, col 72):
Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'.

So the first spurious error is complaining about the presence of ":". The IDs in question, while long, are also perfectly valid if we are operating on the XML IDs "must start with a letter, underscore, or colon, and be composed of letters and numbers, underscores, periods, and hyphens." per https://books.google.com/books?id=xYEyK_7TWP8C&pg=PA531&lpg=PA531&dq=XML+IDs+must+begin+with+a+letter+a+number+or+an+underscore&source=bl&ots=Ps2AMfv_pG&sig=5pij0YU55T7olOvkFqVSjI_fwxY&hl=en&sa=X&ved=0CCQQ6AEwAWoVChMI9obj2fDgyAIVCRo-Ch3VJAvB#v=onepage&q=XML%20IDs%20must%20begin%20with%20a%20letter%20a%20number%20or%20an%20underscore&f=false

The same IDs do not pop errors in EPUB 3.x executed through Oxygen XML Editor.

We need to establish whether this a Pagina specific error, or something specific to EPUBCHECK 4.X.

Please let me know any questions or comments.

Thanks,
Jean Kaplansky
jkaplansky@safaribooksonline.com

@mattgarrish
Copy link
Member

If you're generating GUIDs for IDs you need to make sure that they don't start with a number.

The error message is not singling out that a colon is in error, but that the value has to be an xml name without any colons in it to be valid. You're breaking the valid xml name part if you don't have any colons.

It sounds like this message could use some tweaking as Tzviya also ran into it in #533 .

Perhaps break the statements in two along the lines of "must be a valid XML name, and must not contain any colons"

@rdeltour
Copy link
Member

Right, "XML name without colons" comes from the value of the id attribute being required to match the NCName production as required in the Namespaces in XML 1.0 spec.

This error message comes from Jing (the 3d party RelaxNG + Schematron validation engine used by EpubCheck) and as far as I remember we cannot easily override it to make it more user friendly.

See also #193, #224, #307, #533.

I'm closing as wontfix, since we do not have control over the message.

@rdeltour rdeltour added type: duplicate The issue duplicates an existing issue status: wontfix The issue is rejected due to limitations (of scope or dev resources) labels Oct 26, 2015
@jeankaplansky
Copy link
Author

Sorry. I did do a cursory search before filing the issue. Too bad about not being able to come up with something more obvious. Should I tell people to leave their really long IDs alone in the future, or is it just better for downstream practices if everything truly passes EPUBCHECK?

Thanks,
Jean

@rdeltour
Copy link
Member

Too bad about not being able to come up with something more obvious.

Yes. I'll have another look to see if this can be somehow improved.

Should I tell people to leave their really long IDs alone in the future, or is it just better for downstream practices if everything truly passes EPUBCHECK?

I'd say it's better if it passes validation as early as possible 😄 . Long IDs are fine, as long as they match the type restrictions. As Matt said, the culprit is generally a numeral used as the first character.
Also, note that latest oXygen version 17.1 comes with EpubCheck 4.0.

@jeankaplansky
Copy link
Author

Thanks for letting me know. I'll update my Oxygen.

@mattgarrish
Copy link
Member

Oh, right, I'll have to try to file that away in longer-term memory. I thought you were outputing the message and that you closed the last one because you couldn't specify the exact location of the problem in the value...

@tofi86
Copy link
Collaborator

tofi86 commented Oct 26, 2015

This error message comes from Jing (the 3d party RelaxNG + Schematron validation engine used by EpubCheck) and as far as I remember we cannot easily override it to make it more user friendly.

@rdeltour I haven't tested yet, but now that the Jing properties files are available in this repo ( https://github.com/IDPF/epubcheck/blob/master/src/main/resources/com/thaiopensource/datatype/xsd/resources/Messages.properties#L53 ) it should be easy to tweak the message...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: wontfix The issue is rejected due to limitations (of scope or dev resources) type: duplicate The issue duplicates an existing issue
Projects
None yet
Development

No branches or pull requests

4 participants