New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safer ImageMagick conversions #663
Comments
Also, since I am planning on rerunning arXiv in the coming couple of weeks, and I am certain I will encounter this exact problem, having some solution prior the run would be very helpful. On the other hand I am running |
On 09/23/2015 12:58 PM, Deyan Ginev wrote:
One big difficult is that we make the conversions through Indeed, the very fact that external programs are being invoked It is conceivable (though non-trivial, if I remember the coding) In any case, do you have a test document that reproducibly |
Yes, I now have a really bad EPS file that I can share, but not yet publicly. I'll get permission to mail it to you in private. It's embargo date expires soon, so we could eventually make it part of the test suite. |
So, just checked, and it seems the Perl ImageMagick module uses XS to directly interface with the imagemagick system libraries. Which is nice because it is efficient, but also explains why LaTeXML itself becomes "very dead" in some cases, where only a I do, however, have a pragmatic suggestion - and that is to do a Perl-level Does that sound acceptable to you? I can whip up a PR today/tomorrow to illustrate. |
On 09/24/2015 11:52 AM, Deyan Ginev wrote:
Yes, but it sounds like the only problem area is when the imagemagick And it turns out there's a file delegates.xml that describes how Alternatively, if it's really only ps (pdf? AI?) that we need to
Yeah, that sounds like the necessary approach, I guess. Kinda worried about performance: math to image conversion |
I think we can only fork when it is a user supplied image file, not when we are doing latex-based images. I am yet to see that fail so badly. |
On 09/24/2015 01:41 PM, Deyan Ginev wrote:
probably right and I was just thinking the same thing: we can likely |
Btw, I just surveyed a random sample of the arXiv submissions in 2014 and 2015 - there are still a LOT of |
On 09/24/2015 02:06 PM, Deyan Ginev wrote:
It might be that only LaTeXML::Util::Image::image_read needs the [although LaTeXImages uses the sequence image_object, ->Read, which |
On 09/24/2015 02:06 PM, Deyan Ginev wrote:
Oh, and I think that ghostscript gets invoked on pdf and ai (adobe |
Correct, I have also seen recent use of PDF for storing very high quality vector graphics (which shocked me out of my mind). |
Pull request #666 gave a seemingly sensible patch to manage this problem, via setting Image Magick specific environment variables. However, testing strongly suggested that the timing control And this raises the question of what the various memory and disk limits are measuring; are they also just measuring the current process? I'm inclined to comment out the code in this patch, leaving it (and maybe additional comments here) as documentation of what you might try if you're in a server situation. I would also probably close this bug as "unfixable" for routine use: if you're running from the command line, just kill it; in server situations, you've just got to protect it more. |
It is fixable, we just need to fork() the image conversion and track its time from the parent process. Perl can also kill -9. It's sad that the option doesn't do what we expect it to, but we can emulate it nicely. |
Also, if we measure the execution time of the current process, we can add the time on top of the max allowed image conversion time to the current runtime and set |
On 10/08/2015 04:00 PM, Deyan Ginev wrote:
Yeah, perhaps you're right. It still seems like |
Have you had any more exprerience with bad ghostscript on CorTeX? Does it suggest anything sane that can be done from within LaTeXML? Or is this just an issue that any hardened server will have to protect against using timeouts & environment variables? |
I have had plenty of bad experiences yes. But now I have the ENV timeouts in place for ghostscript, which work wonderfully: https://github.com/dginev/LaTeXML-Plugin-Cortex/blob/master/bin/latexml_worker#L50 I still consider this an aspect LaTeXML can be better at, yes. The benefit of doing it internally is that we can have informative error messages on why the timeout occurred, as opposed to a dirty kill of latexml (which CorTeX still does after 20 minutes and 20 seconds, for some pathological jobs). And that we can do it once and not force any latexml user do it from scratch, especially since there are situations where the problems are pathological and require a KILL signal. |
Upon my recent investigation of Perl's safe signals in #741 , I was reminded of the ImageMagick breakage from arXiv, not to mention the newly found http://imagetragick.com I would love to have LaTeXML take some protective measures against its image conversion dependencies, as otherwise using it for images in production is a no-go (luckily no need for that in the Authorea context). |
I'm going to claim that LaTeXML is doing as much as it feasibly can, here; partly just to shorten the list of issues to "possible' ones. |
I had reported in the last CorTeX run (way back when in 2014) that there are "out of control" gs processes, running for hours and hogging tons of RAM, essentially in an infinite loop.
I have seen this problem at work also, and would like to share the taste of the solution - although we may need to think how exactly to adapt it to LaTeXML.
The best fix I know of is applicable to manually invoking
convert
and looks like this (the numbers can be of course adjusted):It ensures two aspects:
timeout
and the merciless9
signal.I am not immediately aware of the best way to transfer this to the LaTeXML calls to convert, but I definitely consider it a wise idea, as it would help the predictability of LaTeXML in all use cases.
The text was updated successfully, but these errors were encountered: