New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong encoding in HTML output of references #919
Comments
essentially this is the same error as #918. |
Oh, maybe not a dup, after all; somehow the encoding of the bib file is not being seen correctly. |
I did some digging today, and this bug is a good old friend - we are looking at bibliography fields which have undergone two UTF-8 encoding passes, rather than one, hence the garbled content. From as early as getting read in by You can reproduce the bad encoding by simply running:
which prints doubly encoded unicode. And likewise you can validate that the input encoding is not handled right, as using the alternative invocation:
creates the correct XML output, as the bibliography strings are first decoded to native Perl strings. I wonder if it may be worth it to try and auto-detect unicode inputs and decode them by default, as this could also lead to subtle bugs in regex matching down the road. All that said, the converter used in MakeBibliography isn't connected to the |
Ah, good to clarify - that's on my #920 branch. On master that command will output the correct unicode, because we never encode the output for STDOUT. So the bug in #918 is distinct and if fact concealed the double-encoding bug here - which can be seen if you instead write to a file on the
which is double-encoded, while the explicit encoding fixes things:
|
So, really what's going on is that BibTeX just passes through whatever bytes it got, other than the parts it specifically recognizes. Its output is expected to be processed within a LaTeX document that has loaded whatever packages are needed, in particular Thanks for the report! |
I use the following test documents,
test.tex:
and test.bib:
On Mac OS X with basictex and latexml 0.8.2 I do the following
When I open test.html in Chrome I see the following:

Note the encoding issues in the reference to the book and also in the author and title of the bibliography entry.
The bibliography file test.bib is stored as utf-8:
The corresponding PDF output of
latex
andbibtex
doesn't show these issues. So I guess, this could be an encoding issue in the bibtex processing oflatexmlpost
.The text was updated successfully, but these errors were encountered: