pagespeed css request times out with 404 while optimizing #1644
Comments
josh: Can you work around this short term with: aroman: Yes, disabling optimization for the affected css files works. josh: You know your way around our code. You have a great testcase, With our guidance, do you want to take a shot at doing the fix yourself? aroman: Yes. I'd prefer to serve the original resource with private/300 TTL, but I'm willing to do a redirect if that's too hard. josh: I don't think it's excessively hard. You'd have to initiate a CacheUrlAsyncFetcher fetch and block waiting for it to complete because that's the way Apache works, transferring the results to the apache request_rec object. Doing the redirect is easier because you could just call a function to extract the original URL from the .pagespeed. URL and generate the redirect response. aroman: For combined resources, just increasing the timeout isn't going to help. These offending css files reference an absurd amount of dependencies. Wouldn't the correct result be to simple concatenate the original resources if we can't serve the optimized/combined version in time? josh: That would be better, yes. But I think by the end of the day you will have re-implemented https://github.com/pagespeed/mod_pagespeed/blob/master/net/instaweb/rewriter/css_combine_filter.cc and its dependencies. I think it might be better to rewrite the request to skip the CSS rewrites and then initiate a css rewrite but block indefinitely till it finishes This is much heavier lifting than handling the single-URL proxy or redirect case. On Thu, Oct 12, 2017 at 9:03 AM Joshua Marantz jmarantz@google.com wrote: void InstawebHandler::HandleAsPagespeedResource() { BlockingFetch has an undocumented timeout (default 5 seconds, settable in pagespeed.conf). There is a TODO to document it... If a .pagespeed. resource can't be fetched and rewritten in the timeout, a 404 is returned and the web page breaks. We should, for non-combined resources, either redirect to the origin resource, or just serve the origin resource with private/300 TTL. I think a temp redirect would be easier to implement. For combined resources, I think we should have a separate timeout, with a much higher default value. Another option is for CSS is to respond with a body containing CSS @import statements for the components, but I'm not sure if that's technically correct 100% of the time. And for Combined JS and Sprites that would be a lot harder. So I think it might be better to plumb in a higher timeout for combined resources, but ultimately we'd have to respond with an error if the timeout is exceeded. And there's a question of rewritten/combined css files. The outer-most pagespeed URL encoding will look like it has a single input, but transitively it's a combined resource and could not be solved with a redirect. I thought in fact we had code somewhere for serving origin content for single-input .pagespeeed. resources. But maybe that was for a different server? |
The original resource should already be in the local file cache. I'd like to have BlockingFetch return the original resource within ~50 ms if the optimized resource can't be retrieved. The initial request has that behavior -- anyway we can achieve that for the second request? Would it be as simple as making a callback that falls back to the original resource after a timeout? |
The original resource may be in the cache (it's not necessarily a file-cache in general; could be memcached or redis) but it may be evicted or expired. In any case CacheUrlAsyncFetcher will pull it from the closest cache or fetch it from HTTP if it's not present, and then put it in the cache. |
So I'm doing something like this in BlockingFetch:
which generally works great except that i can have two writes to the callback. Is there any provision to accept only the first write to the callback and drop all subsequent writes? |
I'm looking for something like a |
I think it should be straightforward to write MultiAsyncFetch (maybe call it RacingAsyncFetch?). It should be easy to create that in isolation and unit-test it vigorously. It would buffer separately all the writing from candidate fetchers, and just transmit the Fetch's buffer whose Done() method gets called first. One concern of that approach is it could multiply the amount of work your server has to do. More thoughts on your code: It's worth checking -- in a unit test -- whether DecodeUrl for a minified, combined css file, will yield 1 element. Here's an example, from https://www.modpagespeed.com/combine_css.html?PageSpeed=on&PageSpeedFilters=combine_css,rewrite_css Decoding that, I think, will yield 1 URL: I'm not sure what will happen if you fallback-fetch that. It might work -- we might find that intermediate combined-but-unwritten result in the cache, or might send that recursively into pagespeed which will then decode it again. I believe there is already code that does something like this, maybe in net/instaweb/rewriter/rewrite_driver.cc. |
@augustoroman I'm curious: did you manage to make progress on this? Anything we can to do help out? |
We suffer from this problem with random images that probably aren't processed in time to webp, thus ending up with 404s. It would be really helpful if mod_pagespeed would in this case just deliver the original resource and so not break the page. |
As mentioned above, there is a timeout you can adjust to make this happen less: BlockingFetch has an undocumented timeout (default 5 seconds, settable in pagespeed.conf). There is a TODO to document it...
would set it to 10 seconds. |
@jmarantz any news on this one? We're hitting this problem and |
I don´t know the ModPagespeedBlockingFetchTimeoutMs but maybe is the same that |
@Lofesa unfortunately setting |
As far as I can see the ModPagespeedBlockingFetchTimeoutMs is in the code
but don´t have "name" to use it in a config file. Maybe @jmarantz can tell how to use it (if posible). |
Version: 1.12.34.2
While optimizing a css file that references many other images, pagespeed is 404'ing subsequent requests for that resource.
Specifically:
--- so far, so good ---
In this case, the css file takes several minutes to optimize, so we have tons of these 404's until somebody gets lucky and the CDN caches the optimized result.
It's even worse if the css file takes so long to optimize that pagespeed decides to refresh the content before serving the optimized result.
A workaround is to disable optimization for megahuge.css.
Original groups discussion
The text was updated successfully, but these errors were encountered: