{{ message }}

# Blacklist Windows Browser for transcoding to webp #978

Closed
opened this issue Apr 6, 2015 · 33 comments
Closed

# Blacklist Windows Browser for transcoding to webp#978

opened this issue Apr 6, 2015 · 33 comments
Labels

### GoogleCodeExporter commented Apr 6, 2015

 Webp support is now defined using whitelist and blacklist of browsers' versions (\src\pagespeed\kernel\http\user_agent_matcher.cc) It makes sense to rely on the "image/webp" header and do feature detection instead of browser detection. supports_webp_ variable has specific type FastWildcardGroup that allows to return bool values using Match function that takes White/Black lists into account. bool UserAgentMatcher::SupportsWebp function returns supports_webp_.Match(user_agent, false); \src\net\instaweb\rewriter\device_properties.cc contains sample usage of HasValue function that is available for headers analysis: accepts_webp_ = request_headers.HasValue(HttpAttributes::kAccept, kContentTypeWebp.mime_type()) ? kTrue : kFalse; So, I think we can use HasValue function to return bool value from SupportsWebp function basing on request headers values. There is also separate webp-related parameter supports_webp_lossless_alpha_ that is handled the same way as supports_webp_ now. Not sure is it possible to define required value of this parameter using current headers. There is also the following code: const char UserAgentMatcher::kTestUserAgentWebP[] = "test-user-agent-webp"; // Note that this must not contain the substring "webp". const char UserAgentMatcher::kTestUserAgentNoWebP[] = "test-user-agent-no"; but as I can see these variables are not used. However, in case it would be necessary to update them, we can place headers values instead of the hardcoded strings. This should work as accepted headers of client that does not support webp wouldn't contain "webp" substring.  Original issue reported on code.google.com by evgeny.a...@akvelon.com on 29 Aug 2014 at 6:34 The text was updated successfully, but these errors were encountered:

### GoogleCodeExporter commented Apr 6, 2015

 Hi. Can you clarify what enhancement you are asking for?  Original comment by jmara...@google.com on 29 Aug 2014 at 12:06 Changed state: RequestClarification

### GoogleCodeExporter commented Apr 6, 2015

 Hi, I'm asking for the following enhancement: replace useragent-based detection if client supports "image/webp" format with client accept header-based detection. This should allow to remove white/blacklists for webp and requirement to support these lists up to date for every new version of each useragent  Original comment by v-evg...@microsoft.com on 29 Aug 2014 at 12:19

### GoogleCodeExporter commented Apr 6, 2015

 Hi, mod_pagespeed checks both user agent and "Accept: image/webp" header to determine whether the browser supports WebP. User agent is used because it describes the capability of browser more accurately. WebP is an evolving format. When WebP was introduced, it only supports lossy opaque format. Then it added support to lossless format and alpha format. Now it also supports animated format. Old browsers that were compiled with the old WebP library can't display the newer formats. By checking the user agent, we can know exactly which formats of WebP the browser support so we can rewrite the image to the best format. On the other hand, some browsers started to send out "Accept: image/webp" header when they supported lossy opaque format, the first version of WebP. Therefore, to avoid sending WebP of newer version to browsers compiled with older WebP library, we have to only use lossy opaque format, if we only checks "Accept" header. Some browser, such as the builtin Android browser, doesn't send out "Accept: image/webp" header, although it does support lossy opaque WebP. Therefore, by only checking "Accept" header we can't take adavantage of full potential of WebP. That being said, in some use cases (such as in place resource optimization) we can't rely on user agent and must use "Accept" header, so we need both. In the code, merging WebP support from user agent and "Accept" header is in "DeviceProperties::SupportsWebpRewrittenUrls()" in "device_properties.cc". Variables "kTestUserAgentWebP" and "kTestUserAgentNoWebP" are used in tests, for example in "\src\net\instaweb\rewriter\in_place_rewrite_context_test.cc".  Original comment by hui...@google.com on 29 Aug 2014 at 12:58

### GoogleCodeExporter commented Apr 6, 2015

 [deleted comment] 

### GoogleCodeExporter commented Apr 6, 2015

 Hi - can you give specific examples of IE UA strings that are being served webp images? We want to make sure we're actually covering the real use cases here. I see an impractically-large regex on the cited web page, but no actual UA strings.  Original comment by jmaes...@google.com on 11 Sep 2014 at 5:25

### GoogleCodeExporter commented Apr 6, 2015

  Thanks for the question. I think we are less concerned about one specific string and more about future-proofing against database / pagespeed logic changes. Changes made now to pagespeed now may not necessarily get deployed to sites for some time. This means that each time a new UA string is modified it won’t be lit up on the web until we not only have to wait for the change to get into the module, but then for that module to get updated / deployed across the web. This lags the web behind and fragments it for everyone. We’d love for this library to help prevent this type of future fragmentation by doing feature detection rather than UA sniffing. We are recommending that the logic be changed to first do feature detection (looking for the Accept header), and then use UA refinement to handle what types of WebP formats are accepted. We understand that some older browsers that might support WebP don’t necessarily send the Accept header. If you served original content to those browsers, you still wouldn’t be inadvertently breaking the site’s content by misidentifying a browser and sending WebP content. On the other hand, if you keep the code/logic as it exists now, sites would be broken in new UAs until Pagespeed accepts a change and those changes are deployed. This would happen in a rather fragmented fashion. If this looks good to you, we could submit a patch.  Original comment by n.parash...@gmail.com on 12 Sep 2014 at 9:35

### GoogleCodeExporter commented Apr 6, 2015

 Is IE putting out a UA string with the word 'Chrome' in it? It seems like there might a bunch of servers who see that string 'Chrome' in the UA and then proceed to assume the browser is Chrome. Note also that Chrome didn't always support webp and we already have to match against specific versions of Chrome. We agree that using Accept:image/webp is a better approach in principle but it doesn't fully address our need to discover which features in webp are supported (lossless, animation, etc). It also excludes a lot of old-Android browsers (20% market share; still a bit more than Chrome, according to http://www.netmarketshare.com/) which support webp but don't have that Accept header. That's likely a pretty big impact because those older phones are often running on older networks where the reduced bandwidth is a real benefit. I understand the desire to have windows-phones get UA-sniffed as mobile, but maybe Safari would be a better bet for masquerading because it also doesn't support webp (or other various new technologies that Chrome supports). However we'd be happy to add exclusion-logic for IE based on a regex or substring. Can you share with us the actual UA string? We can try to be construct the logic in as future-proof a fashion as possible. Also note: we already do use Accept:image/webp for image-transcoding in contexts where we cannot change the URL (in-place resource optimization), and must serve webp responses to a jpeg URL, adding Vary:Accept so proxy-caches don't get confused.  Original comment by jmara...@google.com on 13 Sep 2014 at 8:36

### GoogleCodeExporter commented Apr 6, 2015

 Summary was: Webp support detection using headers Also I want to clarify something. We will not blacklist a specific long UA-string to avoid sending webp images to IE. We will add a new pattern to this wildcard list: https://code.google.com/p/modpagespeed/source/browse/trunk/src/pagespeed/kernel/ http/user_agent_matcher.cc#129 And sorry for the confusion -- I called for a regex earlier but actually the UA-matching uses a fast wildcard algorithm that compiles a sequence of allow/disallow wildcards into an IR that can be executed very quickly on every request -- much faster than a long list of regexes. All we need is a new pattern to add to kWebpBlacklist in that file, and we can get this in quick. And it would be good to get this to us quickly. I'm not sure if this will make our next release (1.9) but we should be able to get into the first patch, which is what would go to stable eventually and reach most of our users.  Original comment by jmara...@google.com on 13 Sep 2014 at 11:15 Changed title: Blacklist Windows Browser for transcoding to webp

### GoogleCodeExporter commented Apr 6, 2015

 Ah I was curious and found the above-mentioned site site: http://blogs.msdn.com/b/ie/archive/2014/07/31/the-mobile-web-should-just-work-fo r-everyone.aspx has lots of spew in the comments, but nestled in there I found same data: " For those who have asked, here is the useragent string from the new Windows Phone 8.1 Update from August 4, 2014 on my Nokia Lumia 928: Mozilla/5.0 (Mobile; Windows Phone 8.1; Android 4.0; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 928) like iPhone OS 7_0_3 Mac OS X AppleWebKit/537 (KHTML, like Gecko) Mobile Safari/537 " Assuming this is the right UA it's not Chrome that's being masqueraded, it's Android. IMO that's worse, because a site could be sending you to the "Google Play" for the app, etc. Whatever. But I think to solve this problem for MPS we just have to put "*Windows Phone*" into the webp blacklist. Easy enough.  Original comment by jmara...@google.com on 14 Sep 2014 at 2:18 Changed state: Accepted

### GoogleCodeExporter commented Apr 6, 2015

 Window Phone 8.1, which has been in use since August 3, 2014, includes "Android 4.*" in its user agent, in order to get HTML 5 support. Since Android 4.0+ supports WebP, we transcode images to this format. I've started a change for excluding "Windows Phone" from WebP supported list. [1] http://www.neowin.net/news/ie11-fakes-user-agent-to-fool-gmail-in-windows-phone- 81-gdr1-update [2] http://msdn.microsoft.com/en-us/library/ie/hh869301(v=vs.85).aspx  Original comment by hui...@google.com on 15 Sep 2014 at 1:08

### GoogleCodeExporter commented Apr 6, 2015

 Fixed in https://code.google.com/p/modpagespeed/source/detail?r=4247  Original comment by hui...@google.com on 15 Sep 2014 at 4:29 Changed state: Fixed

### GoogleCodeExporter commented Apr 6, 2015

 I think it impacts not only mobile browsers, but also desktop browsers. Is it ok to update the code such that WebP is "NOT" served to any UA that contains both Chrome "and" Edge tokens ?  Original comment by n.parash...@gmail.com on 16 Sep 2014 at 5:51

### GoogleCodeExporter commented Apr 6, 2015

 I don't think 'Chrome' was in the mobile UA. The problem was the "Android 4.0". I just checked desktop IE 11 and the UA is "Mozilla/5.0 (Windows NT 6.1; Win64; x64; Trident/7.0; rv:11.0) like Gecko" so I don't think we'll have a problem there.  Original comment by jmara...@google.com on 16 Sep 2014 at 5:58

### GoogleCodeExporter commented Apr 6, 2015

 According MSDN, IE for desktop doesn't have "Chrome" nor "Android" in the user agent, so they should be confused. What made you think it's a problem? http://msdn.microsoft.com/en-us/library/ie/hh869301(v=vs.85).aspx  Original comment by hui...@google.com on 16 Sep 2014 at 5:58

### GoogleCodeExporter commented Apr 6, 2015

 The next version of Internet Explorer may have "Chrome" and "Edge" to make more websites compatible.  Original comment by n.parash...@gmail.com on 16 Sep 2014 at 6:27

### GoogleCodeExporter commented Apr 6, 2015

 n.parashuram do you work on IE? How do you know that? Is there an MSDN reference for that plan?  Original comment by jmara...@google.com on 16 Sep 2014 at 6:29

### GoogleCodeExporter commented Apr 6, 2015

 I work for Microsoft Open Technologies and work closely with the IE team. The IE team is considering this and it is the current plan. There is no MSDN article on this yet.  Original comment by n.parash...@gmail.com on 16 Sep 2014 at 6:55

### GoogleCodeExporter commented Apr 6, 2015

 OK can you provide the full UA of the proposed change to desktop IE? What sites don't work on desktop IE now? blacklisting "*Edge*" from being webp-compatible seems too general. I'd rather blacklist on "* Trident/*" or something like that.  Original comment by jmara...@google.com on 16 Sep 2014 at 7:08

### GoogleCodeExporter commented Apr 6, 2015

 The full UA has not been finalized yet. This is the change that I was referring to.  Original comment by n.parash...@gmail.com on 22 Sep 2014 at 5:58 Attachments:

### GoogleCodeExporter commented Apr 6, 2015

 It is weird to me that the word Edge implies a chrome that cannot render webp. Surely there must be something else in the purpose UA that more obviously tells us that this is not the chrome we were looking for...  Original comment by jmara...@gmail.com on 22 Sep 2014 at 6:04

### GoogleCodeExporter commented Apr 6, 2015

 We aren’t looking for a Chrome, but “Chrome” will be in the UA string. Much the same way that Chrome UAs includes “AppleWebKit”, “Safari” and “Gecko” tokens the IE UA will likely include a Chrome token. The supplied patch is looking for UAs that contain both Chrome and Edge tokens. We could reduce this to simply look for “*Edge/*”  Original comment by n.parash...@gmail.com on 22 Sep 2014 at 8:54

### GoogleCodeExporter commented Apr 6, 2015

 'Edge' seems too generic and it's not obvious that won't appear in some non-IE browser. Will 'Trident' not be in the UA string? I'd really like to see the final UA string. Are you sure that 'Chrome' will appear in the desktop UA for IE11? It does not appear (as far as we can tell) in the new mobile IE UA.  Original comment by jmara...@google.com on 22 Sep 2014 at 8:59

### GoogleCodeExporter commented Apr 6, 2015

 I've been watching this thread for a bit. I hope you'll forgive me for not chiming in sooner as I was hoping to better digest the problem space. I'm a senior PM on the performance team on IE. I don't own the UA string decisions nor do I have the final result, but I have seen multiple iterations of the string in various forms. The end goal, of course, is interoperability, as indicated by the MSDN blog post mentioned earlier (here again for reference: http://blogs.msdn.com/b/ie/archive/2014/07/31/the-mobile-web-should-just-work-fo r-everyone.aspx). The string is still being tested with multiple web runs to see which tokens return the best interoperable content for the modern web. As I understand your view, to be most precise, mod_pagespeed relies not just on feature detection with the Accept header, but also on UA sniffing for precision. I can certainly appreciate that approach considering that older versions of Chrome don't support newer versions of WebP. This approach certainly works if UAs remain generally the same for perpetuity and as long as new UA strings or feature variations aren't introduced. The Achilles heal is in update deployments of the module. With over 100,000 stated deployments of modpagespeed, changes to any browser's UA string must not only make it into the blacklist/whitelist but then must wait for deployments to catch up. This is very similar to the type of slow uptake we see with client-side JavaScript libraries/frameworks. I'm afraid the module currently takes content that would otherwise work cross-platform, and makes it Google-centric based on UA strings. "'Edge' seems too generic and it's not obvious that won't appear in some non-IE browser." That may be true, but that's true for any of the strings that you are already testing for. But it seems that even if not generic terms, it isn't clear that those won't show up in non-IE browsers either. As already indicated, "Mozilla", "Gecko", and "Safari" show up in Chrome's UA string. So, I'm not sure that limiting what the IE team decides they want to put in their UA string is going to prevent it from showing up in another UA string. I'm also not sure that Google intends to be the gatekeeper for UA strings for all vendors. In this case, I think the MS Open Tech guys are simply trying to be good stewards in helping modpagespeed get updated ahead of the curve. What is the final UA string? I don't know and I'm not certain that the UA folks know either. Considering the statement above, "All we need is a new pattern to add to kWebpBlacklist in that file, and we can get this in quick. And it would be good to get this to us quickly." I'd like to confirm the pattern stated already is fairly stable. It is is very similar to what we've seen with Chrome -- basically include the kitchen sink of UA tokens -- followed by "Edge/*". Again, I don't know every token or the order of those tokens, but the pattern will definitely include Chrome* and Edge*. So I think the pattern you want will be something like Chrome/*Edge/* This should be all you need to catch the new version. I hope we can work on a better long-term solution that would depend more on Accept headers as the primary means for transcoding, and do the refinement based on UA string. In this way, you'll only ever serve WebP content to those that explicitly state they support it, and you can refine what version of WebP to serve based on the UA string after the fact. This still has it's issues, but would at least prevent the module from improperly transcoding content for browsers that could otherwise consume the original content. Thanks again for your help!  Original comment by tob...@microsoft.com on 22 Sep 2014 at 11:03

### GoogleCodeExporter commented Apr 6, 2015

 Thanks for the perspective. Actually I am not concerned with old Chrome, since Chrome auto-updates the number of old Chromes is increasingly small, and serving them jpegs when even if they could take webp is fine. The problem is old Android native browsers, and the old stock Android-browser doesn't auto-update. Lots of people use old Android phones on low bandwidth connections so they are exactly the people that would benefit most from a more compact file format. So our webp-detection for Android wants to stay as is (modulo the "Windows Phone" blacklist). But I think we might be able to change our webp-capable-detection for Chrome pretty much as you suggest: look first for Accept:image/webp and then at the UA to determine which webp features are available. Hoewver, mod_pagespeed is not the only web-performance-optimization solution. I'm not sure which other ones try to transcode to webp. There is also Strangeloop (Radware), Akamai Aqua Ion, Torbit (now part of Walmart), Yotta, Acceloweb (Limelight) and others. In addition to that, individual sites may also add custom UA-sniffing logic. By masquerading as Chrome it seems like IE is liable to break more than just MPS sites because there are other Chrome features that sites might use. And most such features actually don't have "Accept" headers. Your point is well-taken though: mod_pagespeed may be one of the widest deployed solutions that cannot be instantly updated, whereas a large site sending webp to Chrome via their own custom logic will probably find out quickly that their site is broken when a new version of IE11 comes out, and push a fix. But for the record, 100k sites is understated by openly available estimates, such as http://trends.builtwith.com/Server/mod_pagespeed. And I am still curious why it's necessary for desktop IE to masquerade as Chrome. I can understand mobile IE masquerading as Android, Chrome, or Safari because you want the mobile versions of sites. But what do you get out of masquerading as Chrome on desktop?  Original comment by jmara...@google.com on 23 Sep 2014 at 12:12

### GoogleCodeExporter commented Apr 6, 2015

 No problem, and thanks for the clarification on Chrome vs. Android browsers being the target. That makes sense. I'm wondering if, given that these are older browsers with known versions, if you couldn't be more specific about the whitelist of Android browsers that accept WebP rather than relying on the blacklist to catch those that don't. I haven't personally haven't looked at your code yet to see how this works, so maybe you're doing that already. I'm happy to take a look. We're aware of the other transcoding solutions and I'm sure there are efforts to reach out to them as well. I just spoke at Velocity Conference and had the opportunity to discuss this personally with a few performance solution providers. We are doing our best to be good citizens by working with folks such as yourself ahead of time. You being one of the largest, we certainly want to work closely with you to offer any support we can to make sure you're aware of our changes and how they'll affect your logic. As I said, I'm a performance guy on the team, and I'm mainly interested in this module mainly from a performance perspective. The reasoning behind their UA string strategy isn't something I'm spending much time in. I just know that the team has been testing and crawling the web with various strings in an attempt to get even more desktop interoperability. In either case, we sincerely do appreciate your willingness to help this move forward. Will you be able to accept the patch with the pattern we provided as is or do you need something else from us?  Original comment by tob...@microsoft.com on 23 Sep 2014 at 12:47

### GoogleCodeExporter commented Apr 6, 2015

 Like I said, I don't like matching on "/Edge" as it is not clear where that comes from, and it's not in the IE string today. But I think n.parashuram's earlier idea of combining Accept with UA for Chrome, but matching on UA only for Android will work and solve the problem as beset as we can tell. I would still really like to know why this masquerading for desktop IE 11 is being considered at all. Don't all web sites work reasonably well in IE 11, since it has very high market share on desktop? I'm re-opening this bug so we can track the filtering of Chrome/webp detection on Accept.  Original comment by jmara...@google.com on 23 Sep 2014 at 3:42 Changed state: Accepted

### GoogleCodeExporter commented Apr 6, 2015

 Ah, sorry if I misunderstood. Your stated approach seems good. I'm imagining that this change is slightly bigger than the previous change and would take longer to update. Given that, do you have an ETA for the fix? I sincerely appreciate your expertise here. I owe you a growler of beer!  Original comment by tobin.ti...@gmail.com on 23 Sep 2014 at 4:45

### GoogleCodeExporter commented Apr 6, 2015

 We recently released MPS 1.9.32.1 (which does not have even the windows-mobile fix yet). Our plan is to do a patch release 19.9.32.2 which will have that fix, the desktop fix, and also some other fixes. I don't have a date for that patch release, but we will certainly do the patch release and ensure it looks good before marking it 'stable', which is where most of the users will benefit. I would still really like to know why this masquerading for desktop IE 11 is being considered at all. Don't all web sites work reasonably well in IE 11, since it has very high market share on desktop?  Original comment by jmara...@google.com on 23 Sep 2014 at 5:02

### GoogleCodeExporter commented Apr 6, 2015

 OK. Thanks for the info. "I would still really like to know why this masquerading for desktop IE 11 is being considered at all. Don't all web sites work reasonably well in IE 11, since it has very high market share on desktop?" Yeah, I'm unfortunately not on that team and I'm not privy to why they are doing what they do with the UA. I would guess it's for the same reasons that Chrome includes "Safari" and "Gecko" in their UA string. To be honest, I'm not even sure why that is either.  Original comment by tobin.ti...@gmail.com on 23 Sep 2014 at 5:09

### GoogleCodeExporter commented Apr 6, 2015

 I have a patch that does what I claimed -- it doesn't do anything special for IE but it looks for an accept:image/webp header if the UA is Chrome (not fully trusting that the word 'Chrome' in the UA is sufficient). If the UA is Android it doesn't need that (and won't get it), but we'd send the Android browser webp anyway. It has not been through the full testing suite or code review yet but I thought I'd expose it early to see if there are comments externally. I still think it's a bad idea for other browsers to masquerade as Chrome, as there are other crazy high impact things we think we can get away when we think we are talking to Chrome, such as defer_javascript. Tobin: if you can relate our concerns to the team controlling the UA I'd appreciate it.  Original comment by jmara...@google.com on 24 Sep 2014 at 9:17 Attachments:

### GoogleCodeExporter commented Apr 6, 2015

 Original comment by jmara...@google.com on 24 Sep 2014 at 9:22 Attachments:

### GoogleCodeExporter commented Apr 6, 2015

 Thanks both for the advice and for the patch. I'll certainly pass this along to the app team.  Original comment by tob...@microsoft.com on 24 Sep 2014 at 10:06

### GoogleCodeExporter commented Apr 6, 2015

 https://code.google.com/p/modpagespeed/source/detail?r=4265  Original comment by jmara...@google.com on 25 Sep 2014 at 5:21 Changed state: Fixed Added labels: Milestone-v32, release-note
mentioned this issue Apr 7, 2019