Wrong encoding in HTML output of references #919
I did some digging today, and this bug is a good old friend - we are looking at bibliography fields which have undergone two UTF-8 encoding passes, rather than one, hence the garbled content.
From as early as getting read in by
You can reproduce the bad encoding by simply running:
which prints doubly encoded unicode. And likewise you can validate that the input encoding is not handled right, as using the alternative invocation:
creates the correct XML output, as the bibliography strings are first decoded to native Perl strings. I wonder if it may be worth it to try and auto-detect unicode inputs and decode them by default, as this could also lead to subtle bugs in regex matching down the road.
All that said, the converter used in MakeBibliography isn't connected to the
Ah, good to clarify - that's on my #920 branch. On master that command will output the correct unicode, because we never encode the output for STDOUT. So the bug in #918 is distinct and if fact concealed the double-encoding bug here - which can be seen if you instead write to a file on the
which is double-encoded, while the explicit encoding fixes things:
So, really what's going on is that BibTeX just passes through whatever bytes it got, other than the parts it specifically recognizes. Its output is expected to be processed within a LaTeX document that has loaded whatever packages are needed, in particular
Thanks for the report!