Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Server-side includes are stripped by remove_comments and rewrite_css #182

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 70 comments

Comments

@GoogleCodeExporter
Copy link

Based on a report from torsten@tributh.net, it seems that server-side includes 
(mod_includes?) is running after mod_pagespeed.  To mod_pagespeed, server-side 
includes look like HTML comments so mod_pagespeed removes them if the site 
owner enabled remove_comments.

This can be made to work if we can figure out how to get mod_pagespeed to run 
*after* mod_includes.

Original issue reported on code.google.com by jmara...@google.com on 10 Jan 2011 at 1:57

@GoogleCodeExporter
Copy link
Author

Original comment by jmara...@google.com on 10 Jan 2011 at 2:01

1 similar comment
@GoogleCodeExporter
Copy link
Author

Original comment by jmara...@google.com on 10 Jan 2011 at 2:01

@GoogleCodeExporter
Copy link
Author

Hey,

Actually SSIs are processed by Varnish - so it will always execute after 
mod_pagespeed.

The simple solution is to disable mod_deflate when a page containing 
esi:include tags is found. This is not specific to the remove_comments filter 
but a general mod_ps/varnishd esi incompatibility.

CFR http://cd34.com/blog/infrastructure/no-esi-processing-first-char-not/


Original comment by robbie.g...@gmail.com on 10 Jan 2011 at 2:10

@GoogleCodeExporter
Copy link
Author

OK; makes sense.  This the first I heard about ESI but we should take a look at 
those too.  To be clear, ESI is entirely distinct from SSI, the latter being 
processed by mod_include in Apache, and the former being processed in Varnish.  
Correct?

And they use different syntax as well: 

SSI:  <!--#include virtual="/footer.html" -->
ESI:  <esi:include>, <esi:remove> and <!--esi ... -->

Ideally mod_pagespeed would not see the SSI because they would be processed 
upstream.  However, if that proves impossible we could also teach mod_pagespeed 
about that special syntax (like it knows about IE directives and avoids 
removing those).

Original comment by jmara...@google.com on 10 Jan 2011 at 2:25

@GoogleCodeExporter
Copy link
Author

yeah, this is very important for me.  i just tried mode_pagespeed and it broke 
all my pages for the reason described here. all my webpages make heavy use of 
SSI, e.g.

<!--#include virtual="/footer.html" -->

if mode_pagespeed always run upstream from server includes, then the 
remove_comments filter should leave all SSI comments untouched.

this is no limited to the comments like 
<!--#include virtual="/footer.html" -->

it should also recognize the other comments like
<!--#exec ... -->
<!--#if ... -->
<!--#endif -->
and all the other special comments handled by SSI.

i'm so frustrated that mode_pagespeed remove_comments filter is not compatible 
with using SSI :(

Original comment by loupi...@gmail.com on 9 Mar 2011 at 6:39

@GoogleCodeExporter
Copy link
Author

also, to work around this problem (until it's fixed), i want to still enable 
the "remove_comments" filter on css and javascript files, but disable it on 
html files.

but there seems to be no way to do that. enabling / disabling filters appears 
to be global (i.e. for all types of files). and  ModPagespeedDisallow will 
globally disable all filters on some pages, which is not good either (i tried 
ModPagespeedDisallow on html files, and it prevents the re-writing of the css 
and js includes to use the cached versions, thus completely defeating the 
entive pagespeed module).

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:05

@GoogleCodeExporter
Copy link
Author

another point that is important (at least to me): i use SSI also in javascript 
files and in css files.

so i can't even use the module rewrite_ JavaScript because when mod_pagespeed 
processes and minifies my javascript files, the SSI includes are expended and 
the cached version of the javascript includes the expended SSI includes.  
that's not at all suitable.

for example if the javascript file includes:
referrer = '<!--#echo var="HTTP_REFERER" -->';

i want this SSI include to remain as-is in the minified file served from the 
cache, because <!--#echo var="HTTP_REFERER" --> will be expended by the server 
to something different each time the script is loaded.

that's just an example, and i've got other similar cases with other env 
variables like URI_REQUEST, that change value at each request.

so mod_pagespeed is completely incompatible with any website that uses SSI, and 
i'm so sad :( 

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:26

@GoogleCodeExporter
Copy link
Author

the idea mentioned in the title of this thread is BAD: if server side includes 
(SSI) were processed before mod_pagespeed, it would completely break because 
SSI can not only include other files, but also env variables, e.g. <!--#echo 
var="HTTP_REFERER" -->.

the pages with those includes expanded should NOT be cached! if SSI was 
processed before mod_pagespeed, mod_pagespeed would cache pages with env 
variables expanded, and that would completely break SSI.

the correct way is that mod_pagespeed should leave all the SSI "special 
comments" untouched in all the processed files (including js, css etc), and 
also, mod_pagespeed should make sure that any pages fetched from the 
mod_pagespeed  cache should be processed downstream by SSI.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 8:44

@GoogleCodeExporter
Copy link
Author

Summary was: Make server-side includes work with remove_comments by tweaking 
order

Just to be clear, remove_comments only removes comments from HTML. rewrite_css 
may also remove comments from CSS. I don't believe anything removes comments 
from JavaScript.

You seem to be contradicting yourself above, in #6 you say that you still want 
to remove comments from CSS and JavaScript, but in #7 you say that you use SSI 
in both and can't have them stripped, which is it?

Right now if you don't want SSI stripped from HTML, you need to turn off 
remove_comments.

Original comment by sligocki@google.com on 9 Mar 2011 at 1:59

  • Changed title: Server-side includes are stripped by remove_comments and rewrite_css

@GoogleCodeExporter
Copy link
Author

Note that mod_pagespeed generally assumes that html content is not cacheable, 
so if you're only using server-side includes in html then you should have no 
issues with caching.  If you're using server-side includes in css, you might 
find it easier to simply include multiple css files in your html, and then use 
mod_pagespeed's combine_css filter to combine them.  I'd urge you strongly not 
to do user-agent- or referrer-based conditional inclusion anywhere except in 
html.

Original comment by jmaes...@google.com on 9 Mar 2011 at 3:28

@GoogleCodeExporter
Copy link
Author

> ou seem to be contradicting yourself above, in #6 you say that you still want 
to remove comments from CSS and JavaScript, but in #7 you say that you use SSI 
in both and can't have them stripped, which is it?

i want to remove the CSS comments from the CSS (e.g. /* comments */
and i want to remove javascripts comments from the javascript
e.g.

// comment
and also /* comments */

but i do NOT want any of my SSI include html-style comments to be touched in 
any way, whether they appear in html files, in css files, in javascript files, 
or in any other type of file that is subject to SSI processing.

SSI-type comments look like:

<!--# ... --> and they can appear in any type of file that is processed by the 
SSI module, and that include css, javascript and other files, not just html.

> Right now if you don't want SSI stripped from HTML, you need to turn off 
remove_comments.

i did that, but clearly it's not enough: the caching issue still breaks 
everything, because the pages that are cached do have my SSI include expanded, 
and that include javascript files.

for example, try a javascript file (.js) with:

alert('this is my user agent: <!--#echo var="HTTP_USER_AGENT" -->');

you will see that the same user-agent will be displayed regardless of the 
actual browser you use to load the page.  that's because mod_pagespreed will 
cause the .js file to be cached AFTER the SSI has been expended, so when 
another person accesses the file with another user-agent, the page that will be 
served will contain something like:

alert('this is my user agent: Mozilla 5.0 (compatible [...]');

and the SSI processing will NOT happen because the SSI comment is not in the 
mod_pagespreed  cached page anymore.

the problem here is that mod_pagespreed caches the pages AFTER SSI has been 
processed.  it should cache pages BEFORE SSI is processed, because SSI should 
happen on all mod_pagespreed cached pages.

also, i noticed another problem: when remove_comments is disabled, it appears 
to not remove the /* comments */ from css files, even in rewrite_css is used. i 
am not sure if this is by design or not.

in any case, because of the cache issue, even if i disable the remove_comments, 
it completely breaks by site, as SSI are not processed dynamically on each file 
that is accessed.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 5:46

@GoogleCodeExporter
Copy link
Author

> I'd urge you strongly not to do user-agent- or referrer-based conditional 
inclusion anywhere except in html.

i use it in javascript for very good reasons: there are implementations of 
javascript that do not give access to of the env variables that the server has 
access to, like user-agent, referrer, etc, and in some cases it is extremely 
useful to use SSI in javascript.

but in any case, the problem would be the same to html: the caching of the 
pages should always happen BEFORE any SSI include is processed - and SSI 
comments should remain completely untouched in any type of file filtered or 
processed by mod_pagespeed.

i think if those two conditions were true, mod_pagespeed would work well in 
combination with SSI.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 5:51

@GoogleCodeExporter
Copy link
Author

so, if i understand well, if i put all my javascript snipplets that use SSI in 
in-line scripts blocks in html files, since html won't be cached, then it 
should allow me to have the minififyed javascript (cached) working.

but still, my HTML files have very significant amount of comments, and i really 
want them to be stripped (but NOT the SSI comments!), and i never want any page 
with SSI to be cached, unless the caching occurs before the SSI comments are 
processed.

it might be worth a try, but using mod_pagespeed only for javascript 
minification might not be worth the effort in the case of my site.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 6:05

@GoogleCodeExporter
Copy link
Author

Hi all, I intend to pick up this issue again soon.  I will review all the 
information in this thread.  But for now:

remove_comments applies only to HTML comments.  We could potentially teach 
remove_comments to leave in the SSI syntax, but I'd prefer making SSI run 
upstream of mod_pagespeed.

All HTML coming out of mod_pagespeed should be marked un-cacheable.  If it is 
coming out of your server marked as cacheable then either (a) you have a very 
old version of mod_pagespeed (b) you are doing sometthing very unnatural in 
your apache configuration to defeat mod_pagespeed here or (c) we have a bug and 
I need to know about it in more detail.  If (c) please open a new issue with a 
detailed description of your apache configuration.  [note: at some point in the 
future we may allow mod_pagespeed's HTML output to be cacheable but today we 
always mark it uncacheable]

rewrite_css should remove CSS comments, but will only touch files that are 
marked as cacheable.

Similarly, Javascript files should only be touched if they are marked as 
cacheable.  If they are using SSI then they should probably not be marked 
non-cacheable.

rewrite_javascript indeed removes JS comments.  It is not sensitive to SSI.  
This will not matter if SSI runs before mod_pagespeed.

Original comment by jmara...@google.com on 9 Mar 2011 at 6:38

@GoogleCodeExporter
Copy link
Author

> remove_comments applies only to HTML comments.  We could potentially teach  
remove_comments to leave in the SSI syntax, but I'd prefer making SSI run  
upstream of mod_pagespeed.

that is really the WORST solution in my opinion, especially is caching is 
involved.

if SSI was done downstream, and the SSI comments were always left untouched, 
and cache pages were cached before any SSI processing, then everything would 
work just fine.

you should really think of all the consequences.  many people use SSI in 
various way and for various purposes, and they all rely on it to work exactly 
as advertised. whatever mog_pageview does, it should not break sites using SSI 
(whether in html, in scripts, or in any other pages).

another reason we use SSI in javascript is to make conditional code based on 
the SERVER_HOST env variable, for example. so even if minified and cached, the 
selection of the code in the javascript file is based on a server-side SSI 
environment variable test, and this test should always work. we rely on that.

i'm sure many other people use SSI in various other ways and for also very good 
implementation reasons. you cannot say: don't use SSI in javascript files, for 
example.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 6:47

@GoogleCodeExporter
Copy link
Author

[deleted comment]

1 similar comment
@GoogleCodeExporter
Copy link
Author

[deleted comment]

@GoogleCodeExporter
Copy link
Author

oops, i meant HTTP_HOST, not SERVER_HOST.

e.g. many of our .js files contain things like:

<!--#if expr="\"${HTTP_HOST}\" = /one_of_my_host_name/" -->

some javascript code that should run only when the request is on that server

<!--#endif -->

(note: our scripts are run on different domains, that are all served by the 
same server)

Original comment by loupi...@gmail.com on 9 Mar 2011 at 6:52

@GoogleCodeExporter
Copy link
Author

Thanks for all the commentary about SSI.  One thing that would really help is 
if you could provide kind of a minimal web-site-in-a-tarball containing 
examples of how you use SSI in JS and HTML.  We'd then try to build a testcase 
out of those and define this bug as fixed when that testcase worked.

I still don't understand why you want mod_pagespeed to run upstream of SSI.  
Either way, we need to prevent caching of mod_pagespeed-generated HTML.

I think that we will not be able to rewrite any JS that has server-side 
includes because the result would presumably vary.  In the absence of 
mod_pagespeed, are you serving cachable JS that is generated differently 
depending on user-agent?  If so you would probably want to mark that with 
Cache-Control:private which allow browser-caching but prevent proxy-caching.  
It will also prevent mod_pagespeed from optimizing the resource, which would be 
the right thing from a functional perspective.  If you are using SSI to 
generate cacheable Javascript that varies based on something other than 
user-agent then I don't see how that can work -- the user's browser would old 
Javascript cached from server A even when served...well I guess it wouldn't be 
re-served if it was cached.  But you get the idea.

I guess I need to understand the scenario and what you are trying to achieve at 
a higher level.


You can work around these thorny issues by evaluating the SSI in HTML (which in 
general we would not allow to be cached), and passing that to the cacheable JS:

HTML:   <script>window.server_host = <!--#echo var="SERVER_HOST" -->';</script>
JS:     if (window.server_host == "one_of_my_host_name") {
           ....

You can then allow open caching of the JS file, thus enabling mod_pagespeed to 
rewrite it.


Original comment by jmara...@google.com on 9 Mar 2011 at 7:13

@GoogleCodeExporter
Copy link
Author

> I still don't understand why you want mod_pagespeed to run upstream of  
SSI.  Either way, we need to prevent caching of mod_pagespeed-generated  
HTML.

that's because it would allow caching of minified javascript that are using SSI.

e.g. in the cache you could have .js files that look like:

<!--#if expr="\"${HTTP_HOST}\" = /one_of_my_host_name/" -->
some minified javascript code
<!--#else -->
some other minified javascript code
<!--#endif -->

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:24

@GoogleCodeExporter
Copy link
Author

> You can work around these thorny issues by evaluating the SSI in HTML (which 
in general we would not allow to be cached), and passing that to the cacheable 
JS:

yes, of course, but this would involve modifying dozens or hundreds of scripts, 
and this is a ad-hoc thing and you cannot expect all the people using SSI to go 
through that.  not to mention the chance of introducing bugs.

the solution to the problem should involve a minimum modification to the 
existing files (html, js etc) and yet allow mod_page to work.

large website  cannot just rewrite hundreds of pages of html and scripts just 
to work-around a bug or shortcoming of some apache module. this is not 
reasonable to assume.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:27

@GoogleCodeExporter
Copy link
Author

When you say this:
   in the cache you could have .js files that look like:

  <!--#if expr="\"${HTTP_HOST}\" = /one_of_my_host_name/" -->
  some minified javascript code
  <!--#else -->
  some other minified javascript code
  <!--#endif -->

Which cache are you talking about, in the absence of mod_pagespeed?  What about 
in the presence of mod_pagespeed?  I'm not understanding your caching strategy, 
and I'm still not understanding how the ordering of mod_include, which runs as 
an Apache output filter, and mod_pagespeed's output filter affect this.

FYI mod_pagespeed's resource-serving path is completely unrelated to the 
output-filter ordering. It fetches resources using an HTTP fetch and stores 
them in a server-side cache.  It serves those resources via an output-generator 
and puts a long cache lifetime on them so it's inappropriate to do SSI on them 
at that stage.

I really would like to understand your caching strategy in the absence of 
mod_pagespeed, so I determine whether we can do something that's consistent 
with that.


Original comment by jmara...@google.com on 9 Mar 2011 at 7:41

@GoogleCodeExporter
Copy link
Author

Which cache are you talking about the mod_pagespeed cache! i'm not talking "in 
the absence of mod_pagespeed".

the only other caching strategy i use is private cache (i.e. on the browser).

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:45

@GoogleCodeExporter
Copy link
Author

> FYI mod_pagespeed's resource-serving path is completely unrelated to the  
output-filter ordering. It fetches resources using an HTTP fetch and stores  
them in a server-side cache.  It serves those resources via an  
output-generator and puts a long cache lifetime on them so it's  
inappropriate to do SSI on them at that stage.

this would work fine if, when it fetches the pages, the SSI was disabled, and 
if, when it serves the pages, SSI could be run on the pages from its cache.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:48

@GoogleCodeExporter
Copy link
Author

by the way, is there a forum moderated by google employees to discuss other 
issues related to mod_pagespeed.

for example, many websites use google adsense advertising. the adsense policy 
states that modifying the adsense code provided by google (generally that's 
javascript) is not allowed.

would adsense allow minifying javascript files or filtering web pages that 
include some adsense code? technically this involved modifying the adsense 
code, but of course functionally it should not change anything. i'd like to 
have some official statement from google adsense about that. i would hate to 
see my adsense account be cancelled just because i'm trying to optimize serving 
speed using mod_pagespeed which is a project supported by google. 

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:56

@GoogleCodeExporter
Copy link
Author

The discussion group is http://groups.google.com/group/mod-pagespeed-discuss

mod_pagespeed currently does not support privately cached resources.  
Specifically, if your JS files are served with HTTP header "Cache-Control: 
private" then mod_pagespeed will leave them alone.  So SSI should continue to 
work and continue to be cached privately.

More specifically:  mod_pagespeed today should fully support you if you (a) 
marked your privately cacheable resources as such and (b) do not enable 
remove_comments, which is off by default.  You can, if you like, enable 
rewrite_css and rewrite_javascript which will minify any css and js files that 
are publicly cacheable.


Can you provide a URL to your site?

Original comment by jmara...@google.com on 9 Mar 2011 at 8:30

@GoogleCodeExporter
Copy link
Author

sorry, i meant:


       <IfModule pagespeed_module> 
                  ModPagespeed on
                  ModPagespeedUrlPrefix                "http://www.loupiote.com/mod_pagespeed/"
                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/" 
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/" 

                  ModPagespeedEnableFilters rewrite_javascript
                  ModPagespeedDomain loupiote.com 
        </IfModule> 

(of course when i test, i set "ModPagespeed on" - i just copied after restoring 
to something that works, i.e. OFF!!!)

Original comment by loupi...@gmail.com on 10 Mar 2011 at 12:07

@GoogleCodeExporter
Copy link
Author

and my html file contains:

<script>
<!--#include virtual="/include/my_inline_include.js" -->
my-javascript;
</script>

the SSI include works when mod_pagespeed is OFF.

when it's ON with the config i posted, the SSI include is not done in the 
inline script.

Original comment by loupi...@gmail.com on 10 Mar 2011 at 12:10

@GoogleCodeExporter
Copy link
Author

[this topic about "something happens that breaks my pages" is totally unrelated 
to server-side includes.  is that right?  can you open a new issue for that and 
describe what you are seeing?]

You are right: ModPagespeed defaults to 'on'.  We thought that if you did 
'LoadModule' or (on Ubuntu) added the sym-link from 
../mods-available/pagespeed.load to  ../mods-enabled/pagespeed.load that you'd 
want to enable it.

What I was suggesting above is that you have this in your .conf:

   AddOutputFilterByType INCLUDES text/html
   AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

I'm hoping (but have not confirmed) that this will cause mod_include to run 
upstream of mod_pagespeed.

It looks to me like we are removing <!-- ... --> comments in rewrite_javascript 
as well, so hopefully that will be fixed as well once we get the filter order 
correct.

Original comment by jmara...@google.com on 10 Mar 2011 at 12:15

@GoogleCodeExporter
Copy link
Author

still not working.

my config now is:

        <IfModule pagespeed_module>
                  ModPagespeed on
                  ModPagespeedUrlPrefix                "http://www.loupiote.com/mod_pagespeed/"

                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/"
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/"

                  ModPagespeedEnableFilters rewrite_javascript

                  ModPagespeedDomain loupiote.com

                  AddOutputFilterByType INCLUDES text/html
                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html
        </IfModule>

> You are right: ModPagespeed defaults to 'on'.

i don't think it's a good idea. most other apache modules default to off, so 
that's not consistent.

> It looks to me like we are removing <!-- ... --> comments in 
rewrite_javascript as well,

yes, well, at least in the inline scripts.  but on the other hand, my "all.js" 
script looks like:

try {
<!--#include virtual="/js/debug.js"-->
} catch (err) {debug_alert('x=catch-all-11-' + err);}
try {
<!--#include virtual="/js/ajax.js"-->
        } catch (err) {debug_alert('x=catch-all-1-' + err); loupiote_ie6 = 1;}
try {
<!--#include virtual="/js/addthis-init.js"-->
} catch (err) {loupiote_ping_async('x=catch-all-10-' + err);}
try {
<!--#include virtual="/js/set-defaults.js"-->
        } catch (err) {loupiote_ping_async('x=catch-all-1');}
try {
<!--#include virtual="/js/get-uri-query-string.js"-->
} catch (err) {loupiote_ping_async('x=catch-all-14-' + err);}
etc...

and it appears that SSI include are done correctly BUT the resulting all.js 
script is NOT minified (whereas the inline scripts in my html files are 
minified).

any idea why my all.js is not minified?

Original comment by loupi...@gmail.com on 10 Mar 2011 at 12:34

@GoogleCodeExporter
Copy link
Author

You probably want
   ModPagespeedDomain *loupiote.com
rather than
   ModPagespeedDomain loupiote.com
as the latter form does not authorize http://www.loupiote.com/js/all.js.  I'm 
not entirely satisfied with that answer, however, because your home page is 
www.loupiote.com which is implicitly authorized.

So there's something else going on that isn't obvious to me right now.  If you 
turn on 'loglevel info', mod_pagespeed will be very verbose about what it's 
trying to do, and may print something about 'all.js'.  You'll want to leave 
'loglevel info' on only while you investigate this because we'll happily fill 
your disk for you if you leave it on indefinitely :)

RE "ModPagespeed defaulting to off" -- you can report that as a separate issue 
but I think we're not likely to change it as there are many thousands of sites 
with it installed as is and an incompatible change like that doesn't seem like 
a good plan :)  We try very hard to avoid breaking existing config files with 
our code updates.


By the way, I am not seeing X-Mod-Pagespeed headers on www.loupiote.com so I'm 
wondering if you are testing this on a different home page.  In that case it 
could be a domain-authorization issue, depending on the origin of your 
alternate home page.

Original comment by jmara...@google.com on 10 Mar 2011 at 12:45

@GoogleCodeExporter
Copy link
Author

is there a way to completely turn OFF all caching done my mod_pagespeed, so 
that i am sure that what i see is not in fact an old cached version that was 
generated by a different configuration?

yeah, the defaulting to on is unfortunate but probably too late to change.

regarding the issue with SSI, i think this should be documented early on in the 
mod_ web page documentation(s), and it should be made clear so the people don't 
spend time pulling their heads over what's happening in case they use SSI.

Original comment by loupi...@gmail.com on 10 Mar 2011 at 1:43

@GoogleCodeExporter
Copy link
Author

> By the way, I am not seeing X-Mod-Pagespeed headers on www.loupiote.com so 
I'm wondering if you are testing this on a different home page.  In that case 
it could be a domain-authorization issue, depending on the origin of your 
alternate home page.

that's because i turned mod_pagespeed off (i have to turn it off until i can 
make it work).  i just turn it on briefly for testing. i guess i should make a 
test page that is authorized, and unauthorize mod_pagespeed on all the other 
pages, using the filters. i'll try that.

Original comment by loupi...@gmail.com on 10 Mar 2011 at 1:46

@GoogleCodeExporter
Copy link
Author

making some progress (i think), but i need help understanding why some files 
are not seen at all by mod_pagespeed.

I have set LogLevel info, so i see the mod_pagespeed messaged in my error_log 
when a file is processed.

here is my config, and i'll leave it "on" (active), so you can try too.

        <IfModule pagespeed_module>
                  ModPagespeed on
                  ModPagespeedUrlPrefix                "http://www.loupiote.com/mod_pagespeed/"

                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/"
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/"

                  # disable CoreFilters:                                                            
                  ModPagespeedRewriteLevel PassThrough                                              

                  ModPagespeedEnableFilters rewrite_javascript

                  ModPagespeedDomain *loupiote.com

                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

                  ModPagespeedDisallow *
                  ModPagespeedAllow *.js
                  ModPagespeedAllow *833171367.shtml
        </IfModule>

the goal is to process only the html file 833171367.shtml (for testing), and 
all the .js files.

the html file is processed, i can see some info messages in the error_log, and 
i can see the
X-Mod-Pagespeed: 0.9.15.3-404
in the header when i run:

$ curl -D - -o /dev/null http://www.loupiote.com/photos/833171367.shtml

but the .js files are not processed: no message in the error_log, and no 
X-Mod-Pagespeed header.

e,g, 

$ curl -D - -o /dev/null http://www.loupiote.com/photos/833171367.shtml

====> so my first question here is: why this config does not appear to process 
the .js files at all?

regarding the HTML, this config appears to not break the in-line scripts (the 
SSI is correctly included in the in-line scripts), but it does somehow modify 
the html causing a visual difference in the page layout (or some inline 
javascript has been tampered with). not sure why, since the core filters are 
disabled.

the effect of mod_pagespeed on this page (with Chrome) is one "blank" line 
added just above the photo, under the "Change image size" line, and within the 
orange frame that appears when hovering. if you go to any other photo page on 
my site (click "random photo" in the menu), you will see that this "blank" line 
is not there.

i'm not sure exactly what causes it (yet), but this is apparently a bug i.e. 
mod_pagespeed modifies something that it should not modify...  

Original comment by loupi...@gmail.com on 10 Mar 2011 at 4:20

@GoogleCodeExporter
Copy link
Author

the effect on the page layout is, once again, related to the use of SSI in my 
html.

the page served by mod_pagespeed contains:

<div id="photo-page-image-container">title="Click to download image"
 >

the source page (before SSI) contains:

<div id="photo-page-image-container" <!--#include 
virtual="/include/photo/photo-page-image-container-attributes.shtml"--> >

yes, i know, it looks bad, but the SSI part is normally replaced by :

title="Click to download image"

so the page served by apache is syntactically correct:

<div id="photo-page-image-container" title="Click to download image" >

but for some reason, mod_pagespeed modifies the page before processing the SSI. 
 i think it closes that div tag because it thinks it was not closed when it 
sees the SSI comment.

i.e. it changes

<div id="photo-page-image-container" <!--#include 
virtual="/include/photo/photo-page-image-container-attributes.shtml"--> >

into:

<div id="photo-page-image-container"> <!--#include 
virtual="/include/photo/photo-page-image-container-attributes.shtml"--> >

then the SSI is processed, causing:

<div id="photo-page-image-container">title="Click to download image"
 >

instead of:

<div id="photo-page-image-container" title="Click to download image" >

if SSI is processed before mod_pagespeed, it should be processed upfront from 
any modification by mod_pagespeed.

that's because the html may be syntactically incorrect before SSI, but 
perfectly correct after SSI.

so any parsing and filtering by mod_pagespeed should be done after SSI, in 
order to prevent this sort of problem.

do you agree with my analysis there?

Original comment by loupi...@gmail.com on 10 Mar 2011 at 4:39

@GoogleCodeExporter
Copy link
Author

i discovered another problem: mod_pagespeed overwrites my Cache-Control 
headers, so i had to disable it until i can find a solution to that 
Cache-Control issue, too.

see:
http://code.google.com/p/modpagespeed/issues/detail?id=232&can=4&colspec=ID%20Ty
pe%20Status%20Priority%20Milestone%20Modified%20Owner%20Summary

Original comment by loupi...@gmail.com on 10 Mar 2011 at 7:49

@GoogleCodeExporter
Copy link
Author

well, i re-enabled mod_pagespeed for only 3 files, so it will not mess-up with 
my Cache-Control on all the other files, for now.

but still no luck with the js and css files, they are not going through 
mod_pagespeed for some reason, and i don't understand why.

here is my config now:

        <IfModule pagespeed_module> 
                  ModPagespeed on
                  ModPagespeedUrlPrefix                "http://www.loupiote.com/mod_pagespeed/"

                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/" 
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/" 

                  ModPagespeedEnableFilters rewrite_javascript
                  ModPagespeedEnableFilters rewrite_css

                  ModPagespeedDomain *loupiote.com 

                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

                  ModPagespeedDisallow *
                  ModPagespeedAllow *debug.js
                  ModPagespeedAllow *menu.css
                  ModPagespeedAllow *833171367.shtml
        </IfModule> 

here is what i get:

$ curl -I http://www.loupiote.com/photos/833171367.shtml
HTTP/1.1 200 OK
Date: Thu, 10 Mar 2011 08:00:02 GMT
Server: Apache/2.2.9 (Fedora)
Accept-Ranges: bytes
X-Mod-Pagespeed: 0.9.15.3-404
Cache-Control: max-age=0, no-cache, no-store
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

[file processed, slight damage due to SSI, and my Cache-Control config 
overwritten, why?]


$ curl -I http://www.loupiote.com/css/menu.css
HTTP/1.1 200 OK
Date: Thu, 10 Mar 2011 08:01:13 GMT
Server: Apache/2.2.9 (Fedora)
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Length: 925
Connection: close
Content-Type: text/css

[not processed, why?]


$ curl -I http://www.loupiote.com/js/debug.js
HTTP/1.1 200 OK
Date: Thu, 10 Mar 2011 08:01:59 GMT
Server: Apache/2.2.9 (Fedora)
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Length: 568
Connection: close
Content-Type: application/x-javascript

[not processed, why?]


Original comment by loupi...@gmail.com on 10 Mar 2011 at 8:02

@GoogleCodeExporter
Copy link
Author

maybe the fact that my css and js are not processed has to do with that:

> mod_pagespeed rewrites and, effectively, proxies resources referenced in the 
main HTML document. It respects public caching headers, so if a resource is not 
explicitly marked public cacheable, mod_pagespeed will not rewrite nor re-serve 
it.

but then, why would it process my html files? (their Cache-Control is set - 
originally, before getting overwritten - to "private, max-age=3600".

i don't understand...

Original comment by loupi...@gmail.com on 10 Mar 2011 at 8:15

@GoogleCodeExporter
Copy link
Author

ohhh i think i understand now!

that's because the filtered css and js files are stored in the cache and must 
be teched with a different URI!

Original comment by loupi...@gmail.com on 10 Mar 2011 at 5:26

@GoogleCodeExporter
Copy link
Author

well, still no luck.  i have added
 ModPagespeedAllow *.css

to my config, but the filtered html does not change the reference to the css:

<link type="text/css" rel="stylesheet" href="/css/all.css" />

is that because my CSS are processed by SSI? does that prevent mod_pagespeed 
from caching my css?

the source file of my all.css looks like:

<!-- @import url(http://www.google.com/cse/api/branding.css); -->
<!--#include virtual="/css/style.css" -->
<!--#include virtual="/css/menu.css" -->
<!--#include virtual="/css/mystyles.css" -->
<!--#include virtual="/css/js.css" -->

Original comment by loupi...@gmail.com on 10 Mar 2011 at 5:36

@GoogleCodeExporter
Copy link
Author

I'm still not 100% sure what's going on but I have some theories.

Theory 1:  We don't minify that CSS file because we can't parse it: it's not 
CSS.  But I would have guessed that SSI would have done the substitution on 
that file when mod_pagespeed fetched it.

Theory 2: mod_pagespeed is not able to fetch that .css file at all, so it can't 
rewrite it.  You can test this theory by adding the "extend_cache" filter, 
which doesn't attempt to parse or alter the .css content -- it just generates 
the signed URL.

If extend_cache succeeds where rewrite_css fails, then we're just not able to 
parse your CSS file for some reason.  If extend_cache fails, then you have a 
different issue, probably with your site configuration.

In either case, you can get more detail by setting your log level to 'info' and 
looking at your Apache logfile after refreshing your site a few times in your 
browser.

Original comment by jmara...@google.com on 10 Mar 2011 at 7:18

@GoogleCodeExporter
Copy link
Author

i did what you suggested, still no luck.

here is my config now:

        <IfModule pagespeed_module>
                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/"
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/"
                  ModPagespeed on

                  # disable CoreFilters:
                  ModPagespeedRewriteLevel PassThrough

                  ModPagespeedEnableFilters collapse_whitespace
                  ModPagespeedEnableFilters rewrite_javascript
                  ModPagespeedEnableFilters rewrite_css
                  ModPagespeedEnableFilters extend_cache

                  ModPagespeedDomain *loupiote.com 

                  AddOutputFilterByType INCLUDES text/html
                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

                  ModPagespeedDisallow *
                  ModPagespeedAllow *.js
                  ModPagespeedAllow *.css
                  ModPagespeedAllow *833171367.shtml

        </IfModule> 


and here is the info messages i get with
$ curl http://www.loupiote.com/photos/833171367.shtml

[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: HtmlParse::StartParse
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:55: Unexpected close-tag `head', 
no tags are open
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:200: Invalid tag syntax: 
expected close tag before opener
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1456us: HtmlParse::Flush
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1472us: 
HtmlParse::CoalesceAdjacentCharactersNodes
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1498us: 
HtmlParse::ApplyFilter:CssFilter
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1529us: HtmlParse::SanityCheck
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1607us: 
HtmlParse::ApplyFilter:Javascript
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1777us: HtmlParse::SanityCheck
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1835us: 
HtmlParse::CoalesceAdjacentCharactersNodes
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1857us: 
HtmlParse::ApplyFilter:CollapseWhitespace
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 1902us: 
HtmlParse::ApplyFilter:CacheExtender
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] Rewriting URL 
http://www.loupiote.com/photos_m/833171367-tristan-savatier.jpg is disallowed 
via configuration
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] Invalid resource 
url 'http://www.loupiote.com/photos_m/833171367-tristan-savatier.jpg' relative 
to 'http://www.loupiote.com/photos/833171367.shtml'
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 2017us: 
HtmlParse::ApplyFilter:HtmlWriter
[Thu Mar 10 19:09:04 2011] [info] [mod_pagespeed 0.9.15.3-404] 
http://www.loupiote.com/photos/833171367.shtml:1: 2464us: HtmlParse::FinishParse

the 833171367.shtml file has the X-Mod-Pagespeed: 0.9.15.3-404, but besides 
that, it appears to be un-modified.

the css and js includes in the html file that i fetch are unchanged:
<link type="text/css" rel="stylesheet" href="/css/all.css" />
<script type="text/javascript" src="/js/all.js"></script>

here are the headers when fetching those files:

$ curl -I http://www.loupiote.com/photos/833171367.shtml
HTTP/1.1 200 OK
Date: Fri, 11 Mar 2011 00:17:13 GMT
Server: Apache/2.2.9 (Fedora)
Accept-Ranges: bytes
X-Mod-Pagespeed: 0.9.15.3-404
Cache-Control: max-age=0, no-cache, no-store
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

$ curl -I http://www.loupiote.com/css/all.css
HTTP/1.1 200 OK
Date: Fri, 11 Mar 2011 00:17:24 GMT
Server: Apache/2.2.9 (Fedora)
Accept-Ranges: bytes
Cache-Control: max-age=3600, public
Expires: Fri, 11 Mar 2011 01:17:24 GMT
Vary: Accept-Encoding
Connection: close
Content-Type: text/css

$ curl -I http://www.loupiote.com/js/all.js
HTTP/1.1 200 OK
Date: Fri, 11 Mar 2011 00:17:36 GMT
Server: Apache/2.2.9 (Fedora)
Accept-Ranges: bytes
Cache-Control: max-age=3600, public
Expires: Fri, 11 Mar 2011 01:17:36 GMT
Vary: Accept-Encoding
Connection: close
Content-Type: application/x-javascript

for some reason, mod_pagespeed is not doing anything to my file, besides a 
small modification that i described in comment #49.

and the log does not give any clue.

the error in the log "Unexpected close-tag `head', no tags are open" is once 
again the consequence of using SSI: the source file (before SSI expension) 
looks like:

<!--#include virtual="/include/doctype.shtml" -->
<!--#include virtual="/include/head-start.shtml" -->

...stuff...
</head>

the <head> tag is in an SSI include.

so maybe this sort of parsing error (caused by parsing before doing the SSI 
processing) causes mod_pagespeed to just completely give-up on the file 
(besides making a small modification to it see comment #49) ?

at that point i feel like i have exhausted the ideas. mod_pagespeed just does 
not work if the html file requires SSI processing in order to get a correct 
syntax, it seems.

Original comment by loupi...@gmail.com on 11 Mar 2011 at 12:26

@GoogleCodeExporter
Copy link
Author

i must correct my last comment.

with that configuration, mod_pagespeed does minify in-line scripts, but only 
those that are in the source page BEFORE the SSI are processed.

all the in-line scripts that are included by some SSI are not minified.

i.e. the in-line script minification occurs before SSI (so in my case, since 
most inline scripts are included by SSI, very little minification occurs.

and i still have no idea why the css and js files are not cached (and not 
minified).  maybe that has to do with the fact that they both rely on SSI to 
include multiple js files in a single "all.js" (and "all.css") file.

Original comment by loupi...@gmail.com on 11 Mar 2011 at 12:42

@GoogleCodeExporter
Copy link
Author

I am looking at this now.  If you specify this in your pagespeed.conf file then 
the filters get ordered correctly:


    AddOutputFilter INCLUDES;MOD_PAGESPEED_OUTPUT_FILTER html

You would do this in lieu of this directive that we put in pagespeed.conf at 
installation time:  "AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER 
text/html"

A lot of the comments on this thread describe symptoms that go away when this 
workaround is applied.  In particular, the 'includes' get applied before 
mod_pagespeed's 'remove_comments' transformation is applied, and the included 
text gets optimized by mod_pagespeed as well.

I am leaving this bug open because I'd like to correct the filter order in 
mod_pagespeed if possible.  I'm still trying to figure out how to do that.

Original comment by jmara...@google.com on 14 Mar 2011 at 3:39

@GoogleCodeExporter
Copy link
Author

Nick/Ben from modules-dev@httpd.apache.org found a simple obvious solution to 
re-ordering the modules I had overlooked.  The fix is en route.

Note that you will still need to mark resources with SSI directives with 
Cache-Control:private to avoid them getting expanded & remembered the first 
time they are looked up.

Original comment by jmara...@google.com on 14 Mar 2011 at 6:24

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

this is fixed in trunk as of r558

Original comment by jmara...@google.com on 14 Mar 2011 at 7:53

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Original comment by jmara...@google.com on 14 Mar 2011 at 7:54

  • Added labels: release-note

@GoogleCodeExporter
Copy link
Author

> AddOutputFilter INCLUDES;MOD_PAGESPEED_OUTPUT_FILTER html

in our case it should be:

AddOutputFilter INCLUDES;MOD_PAGESPEED_OUTPUT_FILTER html shtml

since the last parameter(s) is "extension", and .html is not the only extension 
for text/html type.

wouldn't it be better to use the following?

AddOutputFilterByType INCLUDES;MOD_PAGESPEED_OUTPUT_FILTER text/html

Original comment by loupi...@gmail.com on 15 Mar 2011 at 10:57

@GoogleCodeExporter
Copy link
Author

Correct.  Hopefully this will be a moot point soon as we are working on 
supporting SSI directly.

Original comment by jmara...@google.com on 15 Mar 2011 at 11:01

@GoogleCodeExporter
Copy link
Author

i tried those two:

AddOutputFilter INCLUDES;MOD_PAGESPEED_OUTPUT_FILTER shtml
and
AddOutputFilterByType INCLUDES;MOD_PAGESPEED_OUTPUT_FILTER text/html

in both cases, no luck, i.e. if i enable remove_comments, the SSI are not done 
in our  shtml pages, i.e. the SSI are processed after the comments are removed. 
 so it's no good.

Original comment by loupi...@gmail.com on 15 Mar 2011 at 11:09

@GoogleCodeExporter
Copy link
Author

the issue is not fixed!

Original comment by loupi...@gmail.com on 15 Mar 2011 at 11:27

@GoogleCodeExporter
Copy link
Author

this is fixed in trunk as of r558

Are you building from source or using a binary distribution?  We are working on 
a binary distribution but it isn't out yet.

Original comment by jmara...@google.com on 15 Mar 2011 at 11:30

@GoogleCodeExporter
Copy link
Author

maybe it depends on the order of statements in the apache config file?

in our case, we have

        Options +Includes

after the mod_pagespeed configuration.

Original comment by loupi...@gmail.com on 15 Mar 2011 at 11:30

@GoogleCodeExporter
Copy link
Author

ahhhh ok. i was using the public binary distribution (rpm). sorry for the 
misunderstanding.

however i won't be able to use mod_pagespeed for production because of 237, and 
also i have not yet been able to get the js and css to be processed and 
re-written, for some reason (still don't understand why).

Original comment by loupi...@gmail.com on 15 Mar 2011 at 11:36

@GoogleCodeExporter
Copy link
Author

This is fixed in binary release 0.9.16.9.

loupi...please see whether this resolves other issues you've brought up.  If 
there are still problems with your site that don't have Issue #s associated 
with them, please report them as separate issues or follow up to 
mod-pagespeed-discuss@googlegroups.com.

Original comment by jmara...@google.com on 17 Mar 2011 at 12:25

@GoogleCodeExporter
Copy link
Author

Original comment by sligocki@google.com on 21 Mar 2011 at 5:39

  • Removed labels: release-note

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant