New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadFromFile crashes on resources too big for memory #1386

Closed
jeffkaufman opened this Issue Sep 3, 2016 · 10 comments

Comments

Projects
None yet
3 participants
@jeffkaufman
Copy link
Contributor

jeffkaufman commented Sep 3, 2016

Here's a large (251MB) PDF: http://www.jefftk.com/mike-mulligan-obsolete-internal.pdf?PageSpeed=off

With PageSpeed on, it doesn't load: http://www.jefftk.com/mike-mulligan-obsolete-internal.pdf

2016/09/03 10:02:31 [alert] 9914#0: worker process 29497 exited on signal 6
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

This is with LoadFromFile on an ngx_pagespeed system without much memory.

Possibly we're blindly trying to allocate enough memory for it, without checking its size? Yes, I think that's it: neither FileInputResource::LoadAndCallback() nor FileSystem::ReadFile() check the file size and refuse to load things that are too big. Anything over ImageResolutionLimitBytes (which is for uncompressed images, and real images will be much smaller) makes no sense to load.

It's also surprising that we'd be doing this for a PDF, since we can't optimize that and we know that immediately from the extension. Looking at FileLoadPolicy::ShouldLoadFromFile, though, our rule for which content types to load from file isn't "can we optimize it" but "do we know what this extension means" + "is it a static resource type".

@jmarantz

This comment has been minimized.

Copy link
Contributor

jmarantz commented Sep 3, 2016

Definitely we should be statting files and comparing size against some max value before allocating anything. I think we should be comparing against the max cacheable size, and not an image-specific size.

Since we do cache extend PDFs I think it does make sense for them to work with LoadFromFile. But I don't think it makes sense to do that in the IPRO flow, only the extend_cache filter. We might need to plumb through some context to FileInputResource to make that judgement call, and just skip file-loading .pdfs from IPRO.

@jeffkaufman jeffkaufman self-assigned this Sep 6, 2016

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Sep 6, 2016

We have a MaxCacheableContentLength, but that defaults to unlimited. Looking briefly, I don't see any other general limits on how big a resource we can optimize.

I'm thinking we should switch that default from unlimited to something high but not crazy, like 32MB, and then use it in LFF?

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Sep 6, 2016

@hillsp What do you think? Should we lower the existing limit, or add a new limit?

@hillsp

This comment has been minimized.

Copy link
Contributor

hillsp commented Sep 6, 2016

Do we do any streaming re-writing? If not, I'm inclined to go with using the existing option.

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Sep 6, 2016

In theory we could do streaming minification of text formats, but nothing
at all is actually designed for that to work, and text is rarely large
enough for that to be worth it.

I'll plan to use the existing option then.

jeffkaufman added a commit to apache/incubator-pagespeed-ngx that referenced this issue Sep 9, 2016

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Sep 13, 2016

Repro link is no longer a query-of-death for jefftk.com, since I manually blacklisted LoadFromFile for pdf files:

    pagespeed LoadFromFileRuleMatch Disallow \.pdf$;
@jmarantz

This comment has been minimized.

Copy link
Contributor

jmarantz commented Sep 14, 2016

Is this not still a problem for resources loaded via http as well? That is, doesn't your server now just loopback fetch the pdf file and have the same memory problems?

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Sep 14, 2016

doesn't your server now just loopback fetch the pdf file and have the same memory problems?

We don't loopback fetch for IPRO, and we do abort recording if the content-length header is too large.

@jmarantz

This comment has been minimized.

Copy link
Contributor

jmarantz commented Sep 14, 2016

but wouldn't extend_cache_pdfs trigger a loopback fetch? Or do you have have that disabled on your server?

Also is
pagespeed Disallow *.pdf
a workaround for this?

@jeffkaufman

This comment has been minimized.

Copy link
Contributor

jeffkaufman commented Sep 14, 2016

extend_cache_pdfs is off by default, and I haven't enabled it.

Disallowing pdfs globally would work, and is slightly better than what I did which was just disabling pdfs for LoadFromFile.

jeffkaufman added a commit to apache/incubator-pagespeed-ngx that referenced this issue Sep 23, 2016

jeffkaufman added a commit to apache/incubator-pagespeed-ngx that referenced this issue Sep 23, 2016

@jeffkaufman jeffkaufman changed the title Failure on very large PDF with LoadFromFile LoadFromFile crashes on resources too big for memory Oct 11, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment