Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mod_pagespeed should strip query-params for pagespeed resources before cache lookup #192

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 15 comments

Comments

@GoogleCodeExporter
Copy link

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

What steps will reproduce the problem?
1. appeared randomly after a long period (weeks) running mod_pagespeed 
2. the following in HTML document HEAD (multiple websites):

  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css">
  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/table_formatting.css">

3. may be triggered by requests from webcache.googleusercontent.com

What is the expected output?
e.g.
  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css+gday_formatting.css.pagespeed.cc.2_ZYbhEFlY.css">

What do you see instead?
e.g.
  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css%22">

(the extra '%22' in the address is causing the CSS not to be read by the 
browser)

First sign in apache logs was 24 hours ago:

68.209.129.173 - - [20/Jan/2011:07:48:19 +0100] "GET 
/form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css%22 
HTTP/1.1" 200 1871 
"http://webcache.googleusercontent.com/search?q=cache:FHpe41nOywQJ:www.the-art-o
f-web.com/html/character-codes/+javascript+character+%26raquo%3B&cd=9&hl=en&ct=c
lnk&gl=us" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; 
SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 
3.0.30729; .NET4.0C)"

which then propagated for 20 minutes (total of 63 occurrences).

Then today it started again, around 2 hours ago, and is continuing. The first 
instance was:

89.241.2.254 - - [21/Jan/2011:14:55:08 +0100] "GET 
/form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css%22 
HTTP/1.1" 200 1871 
"http://webcache.googleusercontent.com/search?q=cache:auh-6vJ3x9wJ:www.the-art-o
f-web.com/css/bordercollapse/1/+border-collapse+separate&cd=1&hl=en&ct=clnk&clie
nt=firefox-a" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) 
Gecko/20101027 Fedora/3.6.12-1.fc13 Firefox/3.6.12"

and the most recent:

94.34.40.110 - - [21/Jan/2011:16:14:37 +0100] "GET 
/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css%22 
HTTP/1.1" 200 1871 "http://www.the-art-of-web.com/css/border-radius/" 
"Mozilla/5.0 (X11; U; Linux i686; it; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 
(maverick) Firefox/3.6.13"

So far today, 362 occurrences and counting.

No signs in preceding one week of log files and we haven't notices a similar 
problem in the last couple of months.


What version of the product are you using? 0.9.14.6-r358

On what operating system? Debian GNU/Linux

Which version of Apache? 2.2.9

Which MPM?

  ModPagespeedDisableFilters inline_css,inline_javascript
  ModPagespeedEnableFilters move_css_to_head
  ModPagespeedEnableFilters rewrite_css,rewrite_javascript
  ModPagespeedEnableFilters outline_css,outline_javascript


Please provide any additional information below, especially a URL or an
HTML file that exhibits the problem.

Occurring on multiple pages across multiple websites on the same server.
Including: http://www.the-art-of-web.com/system/mod-pagespeed/



Original issue reported on code.google.com by dun...@chirp.com.au on 21 Jan 2011 at 3:17

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

%22 is an encoding that browsers use for double-quote.  Note that mod_pagespeed 
itself does not (as far as I know) use % for encoding anything.

Original comment by jmara...@google.com on 21 Jan 2011 at 3:27

  • Changed state: Accepted
@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

[deleted comment]
@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

Looking at your test requests I was able to fix the problem for now by directly 
requesting the CSS URL with "?" instead of "%22" at the end:
GET 
http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed
.cc.p2rjIj5pXZ.css?

Now all the sites referencing the same file are including the '?'.  Looks like 
essentially any characters can be added that way and will persist for future 
page requests.

Original comment by dun...@chirp.com.au on 21 Jan 2011 at 3:58

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

Original comment by sligocki@google.com on 21 Jan 2011 at 5:23

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

Now that I know what to look for I have found several occurences of this
problem in previous log files (going back 10 days).  Here's a summary of
those requests:

# zgrep -h "GET /form_formatting.css+table_formatting.css.pagespeed" 
combined_log* | grep -v "css HTTP" | awk '{print $7}' | sort | uniq -c

      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css'
      3 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css\"
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css%00'
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1%00'=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1'=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1\"=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?117645206%20or%201%3d1--%20=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?117645206%20or%201%3d2--%20=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?119164163'%20or%201%3d1--%20=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?119164163'%20or%201%3d2--%20=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1'%20and%201%3d1--%20=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1%20and%201%3d1--%20=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1'%20and%201%3d2--%20=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1%20and%201%3d2--%20=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?1%2527=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css15926055'%20or%201%3d1--%20
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css15926055'%20or%201%3d2--%20
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css'%20and%201%3d1--%20
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css'%20and%201%3d2--%20
    703 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css%22
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css%22?
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css%222
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css%2527
      4 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?8218e55e94bbef1281ec670e=1
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css8218e55eeb513c4c93fcda47
      3 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?abc
    109 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?bhu
      3 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?cssReloader=1290816000000
   1519 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?cssReloader=1292181283290
    304 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?cssReloader=1293372373000
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?cssReloader=1293372373000&www.bsitecnologia.com.br
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?def
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?fghj
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?.google
     47 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?jCustomerWAPProv
      2 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?jjj
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUS.css?l
      1 /form_formatting.css+table_formatting.css.pagespeed.cc.Er27ofunUT.css%22
     56 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css?
      2 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css\"
    687 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css%22
     25 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css?8218e55e94bbef1281ec670e=1
      4 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css?abc
    583 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css?cssReloader=1290816000000
    157 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css?cssReloader=1292181283000
    727 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css?cssReloader=1292181283290
      4 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css%debugging
     54 /form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.cssDebugging

FYI I have applied a patch on our server using mod_rewrite to handle most
of these requests:

    # to patch combine_css bug in mod_pagespeed
    RewriteCond %{QUERY_STRING} .
    RewriteRule (.*\.css)$ /$1? [R=301,L]
    RewriteRule (.*\.css)\"$ /$1? [R=301,L]

Original comment by dun...@chirp.com.au on 22 Jan 2011 at 10:57

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

I've just scoured the code that is responsible for generating that URL and 
writing it to the HTML file.  I cannot find any vulnerability in it, but we 
will keep looking.

But it should be pretty easy for us to reproduce this if you provide a URL that 
exhibits the problem.  Is http://www.the-art-of-web.com/system/mod-pagespeed/ 
the place where you saw that href?  It's not there now: the href in the HTML 
(using Firefox View->Source) looks good.  The mod_rewrite workaround you did 
would not affect the HTML source; only Apache's handling of it.


The mess in your log looks very strange.  It's not just random garbage appended 
to your ".css" -- it's specific strings.  In particular I'm curious about 
"cssReloader".  That string is not present in the mod_pagespeed source, nor the 
Apache httpd source.  I don't see it on the-art-of-web.com/system/mod-pagespeed 
either.  It must be part of your site or some other module that's loaded into 
your server.  Does that string "cssReloader" ring a bell with you?  I looked at 
a couple of other pages on the-art-of-web.com but did not see it.

What other modules do you have loaded into Apache?  Do you have other 
mod_rewrite rules?

Another question: how much repetition is there in the log?  If you remove the 
'uniq' how many more lines will you get?


Original comment by jmara...@google.com on 22 Jan 2011 at 1:54

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

In your original description, you said:

    3. may be triggered by requests from webcache.googleusercontent.com

What did you mean by that?  Does that hostname match that the referring IP 
address you saw in you access log with these requests?

Original comment by jmara...@google.com on 22 Jan 2011 at 2:02

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

I think you've missed the point slightly about what's happening.

We have a number of websites on the same server including the same
external style sheets:

  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css">
  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/table_formatting.css">

With Mod Pagespeed enabled this should be (and is) rewritten to:

  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css">

Things go pear-shaped when a request is made for a variant of that
address.  For example:

- a request from webcache.googleusercontent.com appends "%22" to the end; or
- a request from Firefox with the CSS Reloader plugin appends 
"?cssReloader=1290816000000"; or
- a Google engineer (y/day) appends "Debugging":

74.125.60.1 - - [21/Jan/2011:16:42:25 +0100] "GET 
/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.cssDebugging 
HTTP/1.1" 200 1871 "-" "Mozilla/5.0 (X11; U; Linux x86_64; en-US) 
AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.67 Safari/534.13"

Whatever string is appended now appears in pages as they're served to
other user agents:

82.77.216.134 - - [21/Jan/2011:16:42:50 +0100] "GET 
/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.cssDebugging 
HTTP/1.1" 200 1871 "http://www.dws.ro/" "Mozilla/4.0 (compatible; MSIE 8.0; 
Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; 
.NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; MS-RTC LM 8)"
128.30.52.103 - - [21/Jan/2011:16:43:33 +0100] "GET 
/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.cssDebugging 
HTTP/1.1" 200 3971 "-" "Jigsaw/2.2.5 W3C_CSS_Validator_JFouffa/2.0"
220.181.108.155 - - [21/Jan/2011:16:45:09 +0100] "GET 
/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.cssDebugging 
HTTP/1.1" 200 1871 "-" "Baiduspider+(+http://www.baidu.com/search/spider.htm)"

I've observed this continuing for periods of 20 minutes, 40 minutes and
over 2 hours.  I don't know what makes it go away, but eventually it does.

The last instance was approx. 4 hours ago.  I believe the mod_rewrite
patch will fix the problem by preventing CSS requests with extra
characters/parameters from being seen (and cached s/h?) by Mod Pagespeed.

The list of addresses above was generated using "uniq -c" - so the numbers
in the LHS column tally the number of occurences.

If you want details of other modules installed or rewrite rules I can send
them to a private address..

Original comment by dun...@chirp.com.au on 22 Jan 2011 at 2:26

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

OK I understand now.  You said in the description that you saw HTML of this 
form:

 <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css%22">

But if all that's happening is that query-params or escaped spaces are being 
appended onto mod_pagespeed's rewritten URL by user-agents or FF plugins, then 
we're having a different conversation.

Can you review for me what symptom you are seeing?  It's true that 
mod_pagespeed will not cache as effectively as desired in the presence of these 
query-params.   Have you been seeing other issues?  For example, if I take 
another mod_pagespeed-enabled site, presumably one that doesn't have your 
mod_rewrite rule, I find that attaching extra query-params to the request does 
not prevent mod_pagespeed from handling it.

Specifically, mod_pagespeed will first check the cache using the exact URL.  
This will fail.  But then it will decode the URL (e.g. look at the different 
CSS files concatenated with "+" in the URL) and re-execute the combination in 
response to the request.  So it should be working despite the cache miss.

I think we need to put your mod_rewrite rule into C++ code in mod_pagespeed 
itself, though, so that we get cache hits by ignoring the extra query-params.  
Note that this will not affect your logs: you'll still see the full variety of 
query-params in your access logs.  But it will make your system run more 
efficiently.


Summary was: has introduced "%22" at end of combined CSS LINK url

Original comment by jmara...@google.com on 22 Jan 2011 at 3:02

  • Changed title: mod_pagespeed should strip query-params for pagespeed resources before cache lookup
  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect
@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

The problem isn't that "query-params or escaped chars are being appended onto 
mod_pagespeed's rewritten URL", but that those same parameters then appear in 
the HTML served to OTHER user agents on OTHER pages and/or OTHER websites 
sharing the same resource string, in some cases causing the CSS to not be 
loaded/parsed by the web browser.

Original comment by dun...@chirp.com.au on 22 Jan 2011 at 3:08

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

[deleted comment]
@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

I've removed the mod_rewrite rules allowing this to be reproduced:

1. View the HTML source code: http://www.the-art-of-web.com/

  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css">

2. Load the following address: 
http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed
.cc.p2rjIj5pXZ.css%22
3. View the HTML source code: http://www.the-art-of-web.com/

  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css%22">

4. Observe that the Send Feedback form on that page now loses most of it's 
formatting.
5. Also for other websites/pages: http://www.aavpa.org/contact.html

  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css%22">

6. Load the following address: 
http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed
.cc.p2rjIj5pXZ.css?
7. View the HTML source code: http://www.the-art-of-web.com/

  <link rel="stylesheet" type="text/css" href="http://resources.chirp.com.au/form_formatting.css+table_formatting.css.pagespeed.cc.p2rjIj5pXZ.css?">

8. Repeat until convinced.

Original comment by dun...@chirp.com.au on 22 Jan 2011 at 3:45

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

Sorry about the confusion.  This is pretty clear now.  We should be able to 
repro & fix.

Original comment by jmara...@google.com on 22 Jan 2011 at 4:15

  • Added labels: Type-Defect
  • Removed labels: Type-Enhancement
@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

FYI we have reproduced the problem and are working on a fix.  Once again, 
thanks for the detailed bug report.  We'll have this out ASAP.

Original comment by jmara...@google.com on 24 Jan 2011 at 7:27

@GoogleCodeExporter
Copy link
Author

@GoogleCodeExporter GoogleCodeExporter commented Apr 6, 2015

Fixed in r388. Hopefully will have a new release including it soon.

Original comment by morlov...@google.com on 25 Jan 2011 at 8:47

  • Changed state: Fixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.