New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webp transcoding should trigger off of Accept headers and issue Vary:Accept #817

Closed
GoogleCodeExporter opened this Issue Apr 6, 2015 · 21 comments

Comments

Projects
None yet
1 participant
@GoogleCodeExporter
Copy link

GoogleCodeExporter commented Apr 6, 2015

Currently, in_place_optimize_for_browser does not work at all in mod_pagespeed 
(Issue 816).  However, if it did, I believe it would trigger on User-Agent, 
which is impactical to deploy.

The reason is that in the in-place flow we are not changing URLs.  To avoid 
having a proxy-cache serve a webp to Firefox, we would need to issue 
Vary:User-Agent, and proxy-caches would need to respect Vary:User-Agent.  Proxy 
caches become completely ineffective due to the infinity of user agents.

To make it feasible to deploy a workable solution to this, Chrome now emits an 
Accept header that includes 'webp', whereas FF does not:

Chrome:  
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
FireFox: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8


In fact, this policy should ideally be extended to all webp transcoding 
decisions, and is not specific to in_place_optimize_for_browser.  If 
html-rewrite an image tag for Chrome into a .webp, and a user saves the URL and 
sends it to someone who pastes it in their web page, and that web page is 
viewed from FF, that will break.

We should fix that by checking Accept headers when responding to the .webp URL 
request, and send back jpeg content if the Accept header does not include 
"image/webp".

There are two cases:
  1. A URL like 256x192xPuzzle.jpg.pagespeed.ic.JZAntB0c9U.webp.  We can do something intelligent here
     and transform the URL to 256x192xPuzzle.jpg.pagespeed.ic.JZAntB0c9U.jpg, saving the optimized jpg
     in our metadata and http caches.
  2. An origin URL that was ipro-transcoded.  In this case I think the software should do the correct
     thing naturally if we are incorporating the Accept:webp bit into our metadata cache key.

There's a third case where the origin site specified a webp, and in that case I 
think we should just serve it.  We do not need to implement transcoding of webp 
to jpeg.

Original issue reported on code.google.com by jmara...@google.com on 7 Nov 2013 at 3:00

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Also of note, Opera Mini on my Nexus 4 gives this Accept header:

Accept
text/html, application/xml;q=0.9, application/xhtml+xml, image/
png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */
*;q=0.1

I found that via http://myhttp.info/

So it will work fine as well by scanning for "image/webp" in the Accept header. 
 Alas there will be some fragmentation in proxy caches as that Accept header 
differs in text from the Chrome one.  However in our metadata-cache I think 
should only be putting one bit "webp" into the key.

Original comment by jmara...@google.com on 7 Nov 2013 at 3:30

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Original comment by jmara...@google.com on 7 Nov 2013 at 3:59

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Perhaps one more thing to consider... while the whole point of Accept is to get 
away from UA sniffing, unfortunately we may still have to use it in some cases 
(*cough* IE *cough*). 

Specifically, IE doesn't cache outbound headers, hence it doesn't "respect" 
Vary (with exception of gzip, which is a special case). As a result, any Vary: 
<val> response will have to revalidated in IE, which is suboptimal.. A 
workaround is to emit Cache-Control: private for IE user-agents. For more 
details see "IE is special": 
http://www.igvita.com/2013/05/01/deploying-webp-via-accept-content-negotiation/

Original comment by igrigorik@chromium.org on 7 Nov 2013 at 4:09

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

I'm not following you on the issue of IE's caching of outbound headers.  We 
won't send webp content to IE in the first place: all the varying will be done 
upstream of IE.

To your general point, I don't see a huge conceptual difference between UA 
sniffing and Accept sniffing, except that the latter appears to be somewhat 
less fragmented, making it feasible to include in proxy cache key.  It's still 
a bit fragmented because the Chrome "Accept" and the Opera "Accept" have 
exactly the same implication for our purposes, but different bytes.

Hopefully, multiple different versions of Opera that accept webp will give us 
precisely the same "Accept" header.  And multiple versions of Chrome that 
accept webp will give us precisely the same "Accept" header.  So we'll have 
just two.

I think the main difference is that UA incorporates a bunch of bits that vary 
quite a bit that we don't care about, and Accept appears to incorporate fewer 
bits that vary quite a lot.

Original comment by jmara...@google.com on 7 Nov 2013 at 4:18

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Regardless of which file you serve to IE, you have to return a "Vary: Accept" 
in the response such that the upstream proxy knows to use the Accept header as 
part of its cache key. The issue is, IE doesn't store its outbound headers, so 
while it can cache the response, it doesn't know if it can use it... hence it 
must revalidate it. 

That said, I guess this is actually a double gotcha: omitting Vary: Accept and 
replacing it with CC: private may blow out the upstream cache of Vary 
response.. Hmm. =|

Original comment by igrigo...@google.com on 7 Nov 2013 at 4:23

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Does IE have to respect Vary? It seems like it would use the same Accept header 
for every request and thus could ignore Vary: Accept. Right? The Vary: Accept 
is just for proxy caches, right?

Original comment by sligocki@google.com on 7 Nov 2013 at 4:29

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Good news, I think.  IE9's Accept header is: text/html, application/xhtml+xml, 
*/*

Thus a proxy-cache that supports Vary:Accept will not deliver result generated 
by MPS/NPS for Chrome, Opera, or Firefox result to IE.  It will pass through 
the request to MPS/NPS.

So we can UA-sniff and do Ilya's server-side tweak: if IE then drop Vary and 
add cc:private.

If there are well-behaved proxy caches (CDN, Varnish) under the site's control 
then it can move the Vary:Accept-stripping for IE to the proxy cache, and turn 
it off in MPS/NPS:

   ModPagespeedDropVaryAcceptForIE off
   pagespeed DropVaryAcceptForIE of;

I propose we default this to 'on' based on the assumption that most MPS/NPS 
users will not have any proxy caches under their control.  And the ones that do 
will can make IE performance better by explicitly implementing "if IE then drop 
Vary and add cc:private" and can turn off that behavior in MPS/NPS.

Original comment by jmara...@google.com on 7 Nov 2013 at 4:45

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

@sligocki: No the problem is different.. 

(a) IE sees the response with "Vary: X" and Cache-Control: max-age=Y
(b) IE stores said resource in its cache with Y ttl expiry
(c) Some time later, we ask for same resource and IE queries its cache.. say Y 
is still valid. BUT... it also sees that there is Vary attached to the 
resource, and sadly, IE does not store the Accept header it sent to the client. 
So, what does it do? The only thing that's safe.. It has to send a revalidation 
request to the server. 

In case of Accept, I could have sent an XHR with a custom Accept header (e.g. 
json vs xml, jpeg vs webp), so it has to perform the revalidation every single 
time. Hooray for IE. 

Original comment by igrigo...@google.com on 7 Nov 2013 at 4:56

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Original comment by jmara...@google.com on 12 Nov 2013 at 3:48

  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect
@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

You can find Accept info for any browser (including mobile) by visiting this 
site:

http://myhttp.info

Original comment by jmara...@google.com on 13 Nov 2013 at 4:11

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

See also: Issue 846.

Original comment by jmara...@google.com on 27 Nov 2013 at 4:30

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Another fly-in-the-ointment: the Android Browser (not Chrome).  That browser 
supports webp but it doesn't auto-update so fast and I don't know if *any* 
versions of it send an accept header that includes "image/webp".

So I think some form of UA-scraping will still be needed or we will regress 
performance on Android browsers which have significantly higher market share 
than Chrome on Android currently.

This, in addition to what do with IE, requires careful design.

Original comment by jmara...@google.com on 27 Nov 2013 at 11:14

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

I verified that Opera for Desktop started to send out "Accept: image/webp" 
since version 11.10. The version was released on April 12, 2011 and was the 
first version supporting WebP lossy. WebP lossless was supported later, from 
version 12.

Original comment by hui...@google.com on 10 Dec 2013 at 4:06

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Original comment by jmara...@google.com on 23 Jan 2014 at 4:13

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Original comment by jmara...@google.com on 31 Jan 2014 at 6:37

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Just to clarify: do we think we need to do this only for IPRO resources, or is 
this an issue for all webp-rewritten images (due to the issue of url 
resharing)?  In the latter case we're not sending any special cache control or 
vary headers.

If we only care about IPRO then I'll restrict my testing to that case.

Original comment by jmaes...@google.com on 31 Jan 2014 at 11:03

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Let's see.

To solve the general problem of deploying transcoding to webp in ipro, we need 
to rely on Vary:Accept as it is not reasonable to ask proxies to support 
Vary:UA.

However, we might want to add Vary:Accept to rewritten URLs as well, because 
that will allow someone to save an image link from Chrome and mail it to 
someone who will look at it on Firefox.

The simplest thing, then, is to switch completely off of UA-sniffing for lossy 
webp, and rely entirely on Accept for webp support.

But we would still need UA-sniffing for lossless support, and link-sharing 
would not work for sites with that feature enabled.

Original comment by jmara...@google.com on 31 Jan 2014 at 11:10

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Just to keep this bug up to date with our present knowledge: I'm updating 
in-place transcoding of images to webp to use Vary:Accept and key only off the 
accept header rather than the User-Agent.

However, we've run into an interesting problem with HTML and CSS files that 
refer to images: Chrome at least does not send Accept: webp unless it is 
fetching an image, so we can't key off that header for rewritten HTML and CSS 
at present.  We currently rely on url rewriting for CSS, which should be safe 
– browsers don't generally make background images shareable, so the question 
of url sharing for rewritten CSS images is moot.  Basically, the HTML will 
refer to CSS and image files that are appropriate for the browser type.

But this means we'll turn on image url preservation mode when rewriting a CSS 
file in place (without altering its url), as the alternative is to serve the 
CSS as Vary: User-Agent which is effectively un-proxy-cacheable.  This will 
take a bit longer to land, but I'm hoping to have it in by later this week.

Original comment by jmaes...@google.com on 10 Feb 2014 at 4:20

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

We now insert Vary: Accept headers for in-place optimized images and serve webp 
images when that is possible.  Still under way is work to reduce the incidence 
of Vary: headers for images that *cannot* be transcoded to webp.

Original comment by jmaes...@google.com on 3 Mar 2014 at 4:36

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

r3843 and r3800 provide this functionality for Ipro images, and insert the 
Accept: headers only when they're actually necessary.

Note, however, that rewritten urls will be served without the Vary: headers.  
This was an explicit decision on our part: we don't want to undermine proxy 
cacheability of rewritten resources unless we're preserving the underlying url.

Original comment by jmaes...@google.com on 11 Mar 2014 at 8:35

@GoogleCodeExporter

This comment has been minimized.

Copy link

GoogleCodeExporter commented Apr 6, 2015

Original comment by jmaes...@google.com on 18 Apr 2014 at 7:13

  • Changed state: Fixed
  • Added labels: Milestone-v31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment