New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latexmlpost: parser error : Input is not proper UTF-8, indicate encoding ! #918

Closed
asmaier opened this Issue Jan 6, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@asmaier

asmaier commented Jan 6, 2018

On Mac OS X with installed basictex and LaTeXML version 0.8.2 when running the command

latexml test.tex | latexmlpost --verbose --dest=test.html -

on test.tex:

\documentclass[a4paper,12pt,twoside,openright]{book}
\usepackage[english]{babel}		
\usepackage[utf8]{inputenc} 
\usepackage{natbib}
\begin{document}
\tableofcontents
\chapter{Präfatßio}
Löräm ipßum dolor ßit amet, conßäctetur adipisiki älit, sed äüsmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exärcitation ullamco laboris nisi ut aliquid ex ea commodi conßequat. Quis aüte iüre reprähenderit in voluptate velit eße cillüm dolore äu fugiat nulla pariatur. Exceptäür sint öbcäcat cupiditat nön proident, ßünt in kulpa kwi offitßia deserunt mollit anim id äßt laborum \cite{Wuersst1887}.
\nocite{*}
\bibliographystyle{apalike}
\bibliography{test}
\end{document}

I see the following error generated by latexmlpost

processing started Sat Jan  6 15:27:34 2018:12: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xA0 0x31 0x22 0x20
  <chapter frefnum="Chapter 1" refnum="1" xml:id="Ch1">
                           ^

The original file is correct UTF-8

$ file test.tex 
test.tex: LaTeX 2e document text, UTF-8 Unicode text, with very long lines

However when I check the xml file generated by latexml, it seems to be ISO-8859 and not UTF-8:

$ latexml test.tex > test.xml
$ file test.xml 
test.xml: XML 1.0 document text, ISO-8859 text, with very long lines

Am I doing something wrong or is this an encoding issue?

@brucemiller

This comment has been minimized.

Owner

brucemiller commented Jan 6, 2018

hmm, i see. the encoding is getting munged up when it goes through stdout. Normally I use the two step process

latexml --dest=test.xml test
latexmlpost --dest=test.html test

so I'm not seeing that. I'll have to look into that.

@asmaier

This comment has been minimized.

asmaier commented Jan 6, 2018

I just used the pipe, because it was suggested in the the documentation: http://dlmf.nist.gov/LaTeXML/manual/usage/

@dginev

This comment has been minimized.

Collaborator

dginev commented Jan 6, 2018

Off-topic, but somewhat related, maybe I should revive the discussion about latexmlc making it as the default executable eventually, would simplify the docs as well, as we can have single command entries for any use case (it unifies latexml+latexmlpost).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment