Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent substitutions for certain sequences on which LaTeX chokes #2

Closed
jxxcarlson opened this issue Nov 5, 2014 · 3 comments
Closed

Comments

@jxxcarlson
Copy link
Contributor

The following characters or sequences of characters are mapped to entities which cause LaTeX
to choke (pdflatex crashes or misbehaves), xelatex proceeds, but leaves ugly entity traces in the rendered output.

The list:

--, which maps to  — 
' (apostrophe), which maps to &#8217
> which maps to >
< which mspd to &lt;

I will add to this as I test more docs.

For the moment I run the output of asciidoctor-latex-converter through the filter below before running latex:

cat foo.tex | sed 's/&#8201;&#8212;&#8201;/--/g' | sed "s/&#8217/'/g" | sed 's/&gt;/>/g' | sed 's/&lt;/</g' > foo2.tex

@jcsalomon
Copy link

Any HTML entity &…; will cause LaTeX to choke, since the ampersand has a particular meaning (array alignment). And even if it didn’t choke, LaTeX still would not understand the entities.

I would recommend a two-step solution:

  1. Replace entities with their Unicode meanings, and
  2. Sanitize the output, e.g., replacing ampersands with \&, backslashes with \textbackslash{}.

@jxxcarlson
Copy link
Contributor Author

I am making some progress on this. There is now set of tests in asciidoctor-latex/tests (still in my forked branch) which provide some control over this. Running ru big -t (generates big.tex and big.pdf from big.doc) now has no HTML-entity or bad char cause fils. Not a guarantee a complete solution, but progress.

The LaTeX converter uses htmlentities in an extension, ent_to_uni to map HTML entities to unicode.

jirutka pushed a commit to jirutka/asciidoctor-latex that referenced this issue Dec 16, 2014
@jxxcarlson
Copy link
Contributor Author

Let foo.adoc be the file with contents

== Test of bad characters

He said — well, um, er — he said that he is sorry.

He also said that 1 < 2.  On this, we can all agree.

Run

$ asciidoctor-latex  foo.adoc
$ xelatex foo.tex

No problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants