Skip to content
This repository was archived by the owner on Apr 10, 2025. It is now read-only.

RespectVary ignored when cache-extending #1373

Closed
eljoest opened this issue Aug 9, 2016 · 19 comments
Closed

RespectVary ignored when cache-extending #1373

eljoest opened this issue Aug 9, 2016 · 19 comments

Comments

@eljoest
Copy link

eljoest commented Aug 9, 2016

X-Mod-Pagespeed: 1.11.33.2-0 on apache

I'm seeing partial rewrites of the HTML src attributes.

  • Images that can be reduced in size will get a new URL with extended cache times, e.g. "x[...].png.pagespeed.ic.Br3BCqRA0i.png".
  • Images (JPEG, PNG) that are not smaller after optimization will not be rewritten and thus served with the original URL and therefore TTL.

The issue can be reproduced with the request parameters ?ModPagespeed=on&ModPagespeedFilters=extend_cache,rewrite_images
but also happens with the production filters:

ModPagespeedEnableFilters extend_cache,remove_comments,rewrite_images,collapse_whitespace,prioritize_critical_css,fallback_rewrite_css_urls,defer_javascript,combine_css,combine_javascript,rewrite_css,rewrite_domains

Original headers of the various images don't differ regarding cache info. This one fails but can't be compressed further:

Cache-Control: public, max-age=43200
Connection: close
Date: Tue, 09 Aug 2016 09:40:47 GMT
Accept-Ranges: bytes
ETag: "5327-5358c48346b4e"
Vary: Origin
Content-Length: 21287
Content-Type: image/jpeg
Expires: Thu, 08 Sep 2016 09:04:17 GMT
Last-Modified: Sat, 18 Jun 2016 12:05:06 GMT
X-Content-Type-Options: nosniff

This one works but is getting reduced to 6204 bytes:

Cache-Control: public, max-age=43200
Date: Tue, 09 Aug 2016 09:44:15 GMT
Accept-Ranges: bytes
ETag: "1841-5357e9b4cf669"
Vary: Origin
Content-Length: 6209
Content-Type: image/jpeg
Expires: Thu, 08 Sep 2016 09:04:17 GMT
Last-Modified: Fri, 17 Jun 2016 19:46:11 GMT
X-Content-Type-Options: nosniff

In mod_pagespeed_message I can see:

[page url]:163: Recompressing image `[original image URL].jpg' (21287 -> 21287 bytes) doesn't save space; dropped.
ipro: Recompressing image `[original image URL].jpg' (21287 -> 21287 bytes) doesn't save space; dropped.
Shrink of image `[...].png' (4887 -> 5782 bytes) doesn't save space; dropped.
@oschaaf
Copy link
Member

oschaaf commented Aug 16, 2016

@eljoest Could you post the html source you receive when you also turn on the debug filter?:
?ModPagespeed=on&ModPagespeedFilters=extend_cache,rewrite_images,debug

@eljoest
Copy link
Author

eljoest commented Aug 22, 2016

Sorry for the long delay, the relevant section shows:

<a href="/uploads/pics/OBD4HDD.jpg" class="fbox" rel="fb" target="_blank"><img src="/typo3temp/_processed_/csm_OBD4HDD_4d2d95dc52.jpg" width="240" height="170" alt="" data-pagespeed-url-hash="1636956066" onload="pagespeed.CriticalImages.checkImageForCriticality(this);"/><!--Image does not appear to need resizing.--><!--The image was not inlined because it has too many bytes.--></a>

full source

@jeffkaufman
Copy link
Contributor

I just tried reproducing this naively on a test site, and I do see images getting rewritten to be cache extended:

<img src=optimized.jpg.pagespeed.ce.0aKT3Yvs2R.jpg>

Copying /typo3temp/_processed_/csm_OBD4HDD_4d2d95dc52.jpg locally, I also see it getting cache extended, so it's not a problem with the image:

<img src=csm.jpg.pagespeed.ce.d9AGCf-0p-.jpg><!--Image does not appear to need resizing.--><!--The image was not inlined because device does not support inlinling.-->

The problem isn't that PageSpeed can't fetch the image or something, since it's able to tell that it doesn't need resizing, and that it's too large to inline. Additionally, when fetched from Chrome it can make a webp:

xcsm_OBD4HDD_4d2d95dc52.jpg.pagespeed.ic.y7s9JIrRdq.webp

If I load your page with extend_cache_images,debug, I still don't see it getting cache-extended, so it doesn't look like a problem with different filters interacting badly:

<img src="/typo3temp/_processed_/csm_OBD4HDD_4d2d95dc52.jpg" width="240" height="170" alt=""/>

Copying the full tag also doesn't fix it:

<div class="news-single-img"><a href="uploads/pics/OBD4HDD.jpg" class="fbox" rel="fb" target="_blank"><img src="typo3temp/_processed_/csm_OBD4HDD_4d2d95dc52.jpg.pagespeed.ce.d9AGCf-0p-.jpg" width="240" height="170" alt=""/></a><p class="news-single-imgcaption" style="width:240px;"></p></div>

I'm at a bit of a loss. Is there anything mentioning this image in your apache log file?

@morlovich
Copy link
Contributor

How good are the original caching headers? If it's already cacheable for at
least a month we won't touch it.
Unusual mimetypes may matter, too.

On Wed, Aug 24, 2016 at 3:10 PM, Jeff Kaufman notifications@github.com
wrote:

I just tried reproducing this naively on a test site, and I do see images
getting rewritten to be cache extended:

Copying /typo3temp/processed/csm_OBD4HDD_4d2d95dc52.jpg locally, I also
see it getting cache extended, so it's not a problem with the image:

The problem isn't that PageSpeed can't fetch the image or something, since
it's able to tell that it doesn't need resizing, and that it's too large to
inline. Additionally, when fetched from Chrome it can make a webp:

xcsm_OBD4HDD_4d2d95dc52.jpg.pagespeed.ic.y7s9JIrRdq.webp

If I load your page with extend_cache_images,debug, I still don't see it
getting cache-extended, so it doesn't look like a problem with different
filters interacting badly:

Copying the full tag also doesn't fix it:

I'm at a bit of a loss. Is there anything mentioning this image in your
apache log file?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1373 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADl1ROl2LdmAGepGLz3BaTPi32O2apypks5qjJc1gaJpZM4Jf5w6
.

@jeffkaufman
Copy link
Contributor

The origin response headers, with ?PageSpeed=off and requested from Firefox, are:

Accept-Ranges: bytes
Cache-Control: public, max-age=43200
Connection: keep-alive
Content-Length: 22962
Content-Type: image/jpeg
Date: Wed, 24 Aug 2016 19:32:28 GMT
Etag: "59b2-5357a6d22953f"
Expires: Fri, 23 Sep 2016 19:32:28 GMT
Last-Modified: Fri, 17 Jun 2016 14:46:56 GMT
Server: nginx
Vary: Origin
X-Backend-Server: w3ext1

Normal mime, max-age isn't that far out.

@jeffkaufman
Copy link
Contributor

It is something about the headers, though! If I set up my local server with:

    pagespeed Domain https://www.tracetronic.de;

And reference the image directly:

<img src="https://www.tracetronic.de/typo3temp/_processed_/csm_OBD4HDD_4d2d95dc5\
2.jpg">

Then I get xcsm_OBD4HDD_4d2d95dc52.jpg.pagespeed.ic.y7s9JIrRdq.webp in chrome, but if I turn off convert_webp_to_jpeg I get csm_OBD4HDD_4d2d95dc52.jpg.

Very strange!

@jeffkaufman
Copy link
Contributor

reproduced locally with netcat; now trying to find out which headers are responsible

@jeffkaufman
Copy link
Contributor

It's the Vary: Origin header that's doing it.

If I have no Vary header, then it works properly.

If I have Vary: Cookie or Vary: Accept then I get "Uncacheable content, preventing rewriting" (and it also won't make a webp)

If I have an unknown Vary header like Vary: Cabbage then I get the same strange behavior as Vary: Origin.

@jeffkaufman
Copy link
Contributor

Vary: User-Agent also has the same strange behavior as Vary: Origin and Vary: Cabbage.

I'm guessing this is something weird about how RespectVary is implemented: https://developers.google.com/speed/pagespeed/module/configuration#respectvary

@eljoest
Copy link
Author

eljoest commented Aug 25, 2016

@jeffkaufman Thanks a lot for the thorough analysis. I have removed Vary: Origin for the images (mostly used for js/"static" ajax) and after wiping the cache it works as expected with my configuration.
I haven't set a value for ModPagespeedRespectVary, so it seems to be a weird behavior with the default setting.

@jeffkaufman
Copy link
Contributor

jeffkaufman commented Aug 25, 2016

Reproduced this locally:

$ PUZZLE_ORIG=localhost:8050/mod_pagespeed_example/images/Puzzle.jpg
$ curl $PUZZLE_ORIG > /dev/null ; sleep 1 ; curl $PUZZLE_ORIG > /dev/null ; sleep 1
$ curl $PUZZLE_ORIG > mod_pagespeed_test/Puzzle-optimized.jpg
$ echo '<img src=Puzzle-optimized.jpg>' > mod_pagespeed_test/Puzzle-optimized.html

Add to test/pagespeed_test.conf.template:

    location /mod_pagespeed_test/Puzzle-optimized.jpg {
      add_header Vary Origin;
    }

Now see that it's not optimized normally:

$ curl localhost:8050/mod_pagespeed_test/Puzzle-optimized.html
<head/><img src=Puzzle-optimized.jpg>

But we'll still turn it into a webp:

$ curl -H 'Accept: image/webp' localhost:8050/mod_pagespeed_test/Puzzle-optimized.html?PageSpeedFilters=+convert_jpeg_to_webp
<head/><img src=xPuzzle-optimized.jpg.pagespeed.ic.jjOWf4PvS4.webp>

And we don't actually have to start with an optimized image; it's enough to ask about cache-extension:

    location /mod_pagespeed_example/images/Puzzle.jpg {
      add_header Vary Origin;
    }

$ curl localhost:8050/mod_pagespeed_test/Puzzle.html
<head/><img src=http://localhost:8050/mod_pagespeed_example/images/xPuzzle.jpg.pagespeed.ic.YsUBdV-i_6.jpg>
$ curl localhost:8050/mod_pagespeed_test/Puzzle.html?PageSpeedFilters=extend_cache
<img src=../mod_pagespeed_example/images/Puzzle.jpg>

So: it looks like it we have a resource with a Vary header then we won't cache extend it, but if we can optimize it at all then we will do so, which includes cache-extending it.

@jeffkaufman
Copy link
Contributor

Here's a unit test that captures this:

TEST_F(CacheExtenderTest, VaryOrigin) {
  options()->EnableExtendCacheFilters();
  rewrite_driver()->AddFilters();

  GoogleString url = AbsolutifyUrl("a.css");
  ResponseHeaders response_headers;
  DefaultResponseHeaders(kContentTypeCss, 100 /* ttl */, &response_headers);
  response_headers.Add("Vary", "Origin");
  SetFetchResponse(url, response_headers, "css file");

  GoogleString cache_extended_css = Encode("", kFilterId, "0", "a.css", "css");
  ValidateExpected("vary origin",
                   StringPrintf(kCssFormat, "a.css"),
                   StringPrintf(kCssFormat, cache_extended_css.c_str()));
}

@jeffkaufman
Copy link
Contributor

To decide whether it's ok to cache-extend, cache_extender.cc checks headers->IsProxyCacheable(), which I see returning false if I set Vary: Origin. Our optimization filters probably do something else?

@jeffkaufman
Copy link
Contributor

CacheableResourceBase::IsValidAndCacheableImpl calls headers->IsProxyCacheable(req_properties, respect_vary(), ResponseHeaders::kNoValidator) while cache_extender.cc calls headers->IsProxyCacheable(). And "The zero-arg version of IsProxyCacheable gives a pessimistic answer, assuming the request has cookies, there is no validator, and we respect Vary."

So the problem is that in cache extending we ignore the RespectVary configuration setting, and just assume the value is true, which is especially confusing since the default value is false.

@jeffkaufman
Copy link
Contributor

All the calls to the zero-arg version of IsProxyCachable are suspect; I'll go through them.

@jeffkaufman
Copy link
Contributor

Cases to fix:

  • CacheExtender::RewriteLoadedResource: easy to pass in respect_vary from options
  • various ones in tests: not a problem
  • RewriteDriver::Write(): this operates on our output headers for rewritten resources. Those shouldn't have Vary headers that make them uncachable, but I guess if they do we should respect them since we created them.

@jeffkaufman
Copy link
Contributor

@eljoest

I haven't set a value for ModPagespeedRespectVary, so it seems to be
a weird behavior with the default setting.

Yes, it's a bug, and I'm working on a patch.

Thanks for reporting this!

@jmarantz
Copy link
Contributor

quick question -- should we get rid of the zero-arg version of
IsProxyCacheable()?

On Thu, Aug 25, 2016 at 4:04 PM, Jeff Kaufman notifications@github.com
wrote:

Closed #1373 #1373 via
dfda935
dfda935
.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1373 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB2kPbv_jac4PBOy4ixR4Aw6I02VugZRks5qjfVUgaJpZM4Jf5w6
.

crowell pushed a commit that referenced this issue Aug 25, 2016
By default we don't respect Vary headers we find on resources.  People who do want us to respect them can set 'RespectVary on' in their configuration.  The cache-extender was not considering the value of the RespectVary option in deciding whether things were cachable, and was instead assuming RespectVary=true.

Fixes #1373
crowell pushed a commit that referenced this issue Aug 25, 2016
By default we don't respect Vary headers we find on resources.  People who do want us to respect them can set 'RespectVary on' in their configuration.  The cache-extender was not considering the value of the RespectVary option in deciding whether things were cachable, and was instead assuming RespectVary=true.

Fixes #1373
@jeffkaufman
Copy link
Contributor

@jmarantz The zero-arg version makes sense in the places it's still being used: (1) tests and (2) handling resources pagespeed creates (where respect-vary shouldn't apply).

@jeffkaufman jeffkaufman changed the title extend_cache doesn't work when image rewriting doesn't save space Don't RespectVary when cache extending either Oct 3, 2016
@jeffkaufman jeffkaufman changed the title Don't RespectVary when cache extending either RespectVary ignored when cache-extending Oct 11, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants