Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W3C validator emits warnings for LaTeXML-generated HTML5 #1016

Closed
flyn-org opened this issue Jul 14, 2018 · 5 comments
Labels
Milestone

Comments

@flyn-org
Copy link

@flyn-org flyn-org commented Jul 14, 2018

I created an HTML5 document using LaTeXML. I then ran the document through W3C's validator (https://validator.w3.org/nu/), and received the following errors and warnings:

Warning: This document appears to be written in English. Consider adding lang="en" (or variant) to the html start tag.
From line 2, column 16; to line 3, column 6
TYPE html>↩<html>↩<head

The following is odd because LaTeXML-html5.xsl contains 'omit-xml-declaration="yes"':

Error: Saw <?. Probable cause: Attempt to use an XML processing instruction in HTML. (XML processing instructions are not supported in HTML.)
At line 1, column 2
<?xml version="1

The following results from the down-arrow download link LaTeXML adds to lstlisting environments:

Error: Bad value for attribute href on element a: Illegal character in scheme data: line break is not allowed.
From line 176, column 158; to line 176, column 787
ing_data"><a href="data:text/plain;base64,Li9zY3JpcHRzL2ZlZWRzIGluc3RhbGwgY2EtY2VydGlmaWNhdGVzIFwKICAgICAgICAg…ICAgIHpvbmVpbmZvLWNvcmUgXAogICAgICAgICAgICAgICAgICAgICAgICB6b25l&#10;aW5mby1ub3J0aGFtZXJpY2E=&#10;">⬇</a><
@dginev dginev added this to the LaTeXML-0.8.4 milestone Jul 14, 2018
@brucemiller

This comment has been minimized.

Copy link
Owner

@brucemiller brucemiller commented Jul 14, 2018

Interesting.... Actually, LaTeXML doesn't know what language it is (and not sure I want to build in language detection), unless you use babel. However, it apparently isn't copying that language to where html5 wants it. Have to look into that.

For the second error, seems that should only appear for xhtml (or one of the other xml formats). baffled?

And, for the 3rd error, it looks as if there's a gratuitous encoded newline added to the data. Not sure how it got there; I'll have to look into that too.

@flyn-org

This comment has been minimized.

Copy link
Author

@flyn-org flyn-org commented Jul 14, 2018

I am looking more into the second error. It might be the result of an XSLT change I made, but I am not yet sure.

I agree about the first item. I am not convinced that some kind of human-language statement in the TeX source or a heuristic would be a good general solution. Perhaps a command-line flag for latexmlpost?

@flyn-org

This comment has been minimized.

Copy link
Author

@flyn-org flyn-org commented Jul 14, 2018

With respect to the second error, it seems the --destination argument affects the output. --destination=index.html.in includes the XML statement, but --destination=index.html does not. In either case, I use "latexmlpost --destination=XXX --stylesheet=/path/to/modified/LaTeXML-html5.xsl index.xml".

I had expected the output document type would be wholly determined by the stylesheet and not partially by the destination filename.

@brucemiller

This comment has been minimized.

Copy link
Owner

@brucemiller brucemiller commented Jul 14, 2018

You'd think so, but the serialization to file also needs to know whether it's html or xml. If the destination extension were recognized (eg. plain .html) it would have generated html(5). Alternatively, you can use --format=html5.

I fixed the 3rd bug. I hadn't read the base64 encoding manpage carefully enough: It linebreaks by default and I hadn't noticed. Should work right now. Thanks for the report!

@brucemiller

This comment has been minimized.

Copy link
Owner

@brucemiller brucemiller commented Jul 27, 2018

Even though I didn't want to guess/assume that documents are in english by default, I was bothered that when using babel with an initial/default language, it wasn't ending up declared on the document. And it turns out xml:lang was only being carried over into html at the top element. I've fixed both those problems as well, so the validator should be happy more often. Thanks for the report!

@dginev dginev modified the milestones: LaTeXML-0.8.4, LaTeXML-0.8.3 Jul 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.