Convert PDF graphics to scalable SVGs #902

bfirsh · 2017-12-08T10:52:57Z

When using --graphicsmap=pdf.svg, it converts the graphic to a SVG with a raster rendering of the PDF. I would expect it to convert it to a vector graphic. The same presumably applies for EPS, AI, and PS.

For Engrafo, we had success using pdf2svg. Presumably the same result can be achieved by piping it through the same LaTeX rendering system that renders math/tikzpicture as SVG.

The text was updated successfully, but these errors were encountered:

brucemiller · 2017-12-08T12:49:24Z

But does it always generate a raster? I'd think it would depend on what kinds of drawing are in the pdf itself; more line oriented would generate vectors, but pdf that has a raster embedded is going to generate a raster in the svg. Do you have any small samples where you'd expect vectors?

[We're using ImageMagick for almost all image conversion. It's finicky, but it's nice to be able to rely on a single tool/dependence]

bfirsh · 2017-12-08T13:48:22Z

If it’s vector in the source PDF, it’s vector in the output SVG. If it’s raster in the source PDF, it’s raster in the output SVG. That’s what pdf2svg does.

Presumably the same process which produces math and tikz SVGs would do the same thing? (I’m not sure how that works but it seems to run them through TeX then somehow outputs an SVG.) If that system were used, then it’s not adding any additional dependencies.

brucemiller · 2017-12-08T13:51:16Z

On 12/08/2017 08:48 AM, Ben Firshman wrote: If it’s vector in the source PDF, it’s vector in the output SVG. If it’s raster in the source PDF, it’s raster in the output SVG. That’s what pdf2svg does.

Yeah; that's my question: Does LaTeXML (using ImageMagick) *not* do that? Does it always generate a raster?

dginev · 2017-12-08T15:11:29Z

Oh, this is a topic I have some very painful experience in, maybe I say a couple of words.

First, imagemagick has "pathological" behavior on certain (very hard to classify or predict) PDF/eps inputs, in particular PDFs that encode vectorial graphics. What I mean by pathological is that it will do any of - an infinite loop in runtime, out of memory exception, silent failure with no image produced and files leftover on the filesystem...

One workaround that was widely used and approved in places such as StackOverflow was to delegate vectorial PDFs to a different processing engine, and in particular - a headless inkscape process. This is something I have seen work very reliably in the past, and in fact also shows the inverse problem - there are pathological images that don't convert in inkscape in say 10 minutes, that finish in a few seconds in imagemagick. And vice-versa, generally following the vectors vs pixels distinction.

On the upside, and to go back to your discussion here, when inkscape succeeds with the conversion the resulting SVG is truly preserving the vectorial definitions in the PDF, and the final image is high quality (if you get the appropriate web fonts to match the PDF fonts, or you get authors writing to you that the kerning is off by several millimeters...).

Should latexml rely on inkscape as well for these cases? Hard to say, it is a rather large dependency and may feel better as a plugin than as a core component. There are entire companies that deal with image conversion / hosting so it's an admittedly large and non-trivial problem. There may also be some space to think about where an exact line needs to be drawn between latexml and a general-purpose image conversion tool.

Lastly, arxiv conversions have given me endless grief in these "pathological" cases - you can't rely on the latexml process to time out / back out of an underlying infinite loop in C (where imagemagick can get stuck), so you need an external watchdog process to monitor that. This is a big part of what LaTeXML-Plugin-Cortex ended up covering for.

Related issues for some background: #663 and #666

bfirsh · 2017-12-14T13:25:00Z

@brucemiller Here's an example of a vector PDF figure: cifar10_48-48-10_batch_10_plot.pdf

Here's the latexml output: https://www.arxiv-vanity.com/papers/1703.00441v2/#S4.F2.sf3

It is also particularly fuzzy because the DPI is not configurable, but that's another problem!

pdf2svg converted this to a scalable SVG without problems.

bfirsh · 2017-12-14T13:27:17Z

@dginev Agreed that a plugin is a good place to start. Perhaps it could be optional core functionality, so it isn't a hard dependency.

I might have a shot at a pdf2svg plugin to fix this for Arxiv Vanity, if I get round to it.

brucemiller · 2018-02-17T11:23:23Z

Ah, yes, of course ImageMagick isn't preserving the vectorness; it's basically a pipeline of raster operations, so the first thing it'll want to do is convert to an internal raster. By the same token, even if we introduce a dependency on pdf2svg to keep the image in raster form, we'd need a vector alternative duplicating the whole transform sequence (all the stuff that graphicx brings in). With svg, this is of course possible, maybe even "easy" in some sense, but a whole bunch of new code & testing. In other words, a bit tricky.

dginev · 2019-04-08T00:13:25Z

A bit too open-ended for 0.8.4, pushing back to 0.9 until we have an attack plan in mind.

dginev · 2022-01-25T14:35:02Z

To pin down a concrete high-difficulty test for this direction of work, today I encountered arXiv:1804.00311.

That article has multiple graphics using PDF assets which take north of 5 minutes to convert via ghostscript -- and even encounter API errors for metadata operations, such as obtaining the size. If we can become more efficient and correct in such cases, as we also start producing SVG for them, that would be an excellent outcome.

dginev · 2024-03-01T16:57:31Z

Today I also stumbled on another gs-intensive example from arXiv:1807.01606. Attaching one PDF asset for future testing - it takes 11 minutes to execute gs on my machine.

fig8.pdf

\documentclass{article}
\usepackage{graphicx}
\begin{document}
\includegraphics[width=10cm]{fig8.pdf}
\end{document}

Resulting PNG:

Since the article has 15 of these PDFs, it reliably times out with the current build setup.

bfirsh mentioned this issue Dec 8, 2017

--graphicsmap=pdf.svg doesn't seem to work #894

Closed

bfirsh changed the title ~~PDF, EPS, etc graphics should be converted to scalable SVGs~~ Convert PDF graphics to scalable SVGs Dec 8, 2017

bfirsh mentioned this issue Dec 8, 2017

Convert PDF, EPS, etc to scalable SVGs arxiv-vanity/engrafo#247

Open

1 task

dginev added the enhancement label Dec 8, 2017

dginev added this to the LaTeXML-0.8.4 milestone Dec 8, 2017

matteosecli mentioned this issue Feb 6, 2018

[bug] Strange behavior of TikZ #945

Closed

dginev added the postprocessing label Feb 22, 2018

This was referenced Oct 23, 2018

Make imagemagick DPI configurable #1066

Closed

[Experiment] Convert PDF graphics to SVG with dvisvgm #1067

Closed

dginev modified the milestones: LaTeXML-0.8.4, LaTeXML-0.9 Apr 8, 2019

dginev mentioned this issue Sep 29, 2021

Smarter sizing in PDF to PNG image conversion #1665

Closed

dginev mentioned this issue Oct 16, 2021

Experiment: disable clip in image conversion #1695

Closed

dginev mentioned this issue Jan 25, 2022

avoid tripping over on images ghostscript cant size #1786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert PDF graphics to scalable SVGs #902

Convert PDF graphics to scalable SVGs #902

bfirsh commented Dec 8, 2017

brucemiller commented Dec 8, 2017

bfirsh commented Dec 8, 2017 •

edited

Loading

brucemiller commented Dec 8, 2017 via email

dginev commented Dec 8, 2017 •

edited

Loading

bfirsh commented Dec 14, 2017 •

edited

Loading

bfirsh commented Dec 14, 2017

brucemiller commented Feb 17, 2018

dginev commented Apr 8, 2019

dginev commented Jan 25, 2022

dginev commented Mar 1, 2024

Convert PDF graphics to scalable SVGs #902

Convert PDF graphics to scalable SVGs #902

Comments

bfirsh commented Dec 8, 2017

brucemiller commented Dec 8, 2017

bfirsh commented Dec 8, 2017 • edited Loading

brucemiller commented Dec 8, 2017 via email

dginev commented Dec 8, 2017 • edited Loading

bfirsh commented Dec 14, 2017 • edited Loading

bfirsh commented Dec 14, 2017

brucemiller commented Feb 17, 2018

dginev commented Apr 8, 2019

dginev commented Jan 25, 2022

dginev commented Mar 1, 2024

bfirsh commented Dec 8, 2017 •

edited

Loading

dginev commented Dec 8, 2017 •

edited

Loading

bfirsh commented Dec 14, 2017 •

edited

Loading