Fix text extraction from pdf's generated from LaTeX #151

Closed
flavioamieiro opened this Issue May 6, 2013 · 1 comment

Comments

Projects
None yet
3 participants
@flavioamieiro
Owner

flavioamieiro commented May 6, 2013

When a pdf is generated from LaTeX using pdflatex, some words are broken up, specially where there are ligatures and accent marks. This LaTeX source:

\documentclass{article}

\begin{document}
Matem\'atica
\end{document}

will output:

Matem´
atica
1
@turicas

This comment has been minimized.

Show comment Hide comment
@turicas

turicas May 6, 2013

Collaborator

Some options (that will require more work to run on worker Extractor) are:

Collaborator

turicas commented May 6, 2013

Some options (that will require more work to run on worker Extractor) are:

@fccoelho fccoelho closed this Jun 9, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment