New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide easier mechanism to flush server cache. #133

Closed
GoogleCodeExporter opened this Issue Apr 6, 2015 · 16 comments

Comments

Projects
None yet
1 participant
@GoogleCodeExporter

GoogleCodeExporter commented Apr 6, 2015

What steps will reproduce the problem?
1. Use an image with 1 hour TTL
2. Load HTML page with the image until it is rewritten
3. Change the image
4. Reload HTML page and notice that image hasn't changed (probably cached 
locally)
5. Shift-Reload page and notice that image still hasn't changed (probably still 
in mod_pagespeed cache)

What is the expected output? What do you see instead?

Without mod_pagespeed, Shift-Reload will flush the cache. It would be nice if 
mod_pagespeed also did that.

Please use labels and text to provide additional information.

From simple experimentation on Chrome, it appears that using Shift-Reload:
* changes "Cache-Control: max-age=0" -> "Cache-Control: no-cache" and 
* adds "Pragma: no-cache"

If we see these perhaps we should flush our cache (at least for these files).

Original issue reported on code.google.com by sligocki@google.com on 4 Dec 2010 at 6:53

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Note that if we had local file mapping we could just avoid caching + 
re-fetching for those files, and check up on the origin resource before serving 
a rewritten resource from cache.  But this isn't a general solution.

Original comment by jmaes...@google.com on 5 Dec 2010 at 3:10

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

From http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

End-to-end reload
The request includes a "no-cache" cache-control directive or, for compatibility 
with HTTP/1.0 clients, "Pragma: no-cache". Field names MUST NOT be included 
with the no-cache directive in a request. The server MUST NOT use a cached copy 
when responding to such a request.

Of course, we're not exactly a cache, but the more we act like one, the less 
confusing we'll be.

Original comment by sligocki@google.com on 27 Dec 2010 at 11:56

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Original comment by sligocki@google.com on 3 Jan 2011 at 9:18

  • Changed state: Started
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

I think this fits in with another issue I have - in that if you alter a css 
file, mod_pagespeed does not notice that it has been updated since it was 
copied to the cache - this makes site development very difficult, as you either 
have to switch off mod_pagespeed or you have to reset the apache server!

Original comment by rwap.services on 2 Feb 2011 at 11:46

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Unfortunately, this is on hold because it could break our caching policy. 
Hopefully we can figure out a way to make this work.

Original comment by sligocki@google.com on 8 Feb 2011 at 4:26

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Issue 257 has been merged into this issue.

Original comment by sligocki@google.com on 31 Mar 2011 at 3:17

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Summary was: Respect browser forced cache break

If there are no objections I think I'm going to hijack this bug to simply make 
a much easier way to flush cache server-side, in lieu of the current hack in 
http://code.google.com/p/modpagespeed/wiki/FAQ#How_do_I_clear_the_cache_on_my_se
rver?

This will make it easier to develop sites without constantly restarting servers 
per that FAQ entry.


The proposed mechanism is to simply touch a file, e.g.
   touch /var/pagespeed/cache.flush
and within a minute the Apache server will have flushed the cache.

In a multi-server setup you'd just need to do:


foreach host (host1 host2 host3 host4)
   ssh $host touch /var/pagespeed/cache.flush
end


Also note the ModPagespeedLoadFromFile option which provides a much better 
development experience without requiring cache flushes.

Original comment by jmara...@google.com on 5 Apr 2012 at 6:07

  • Changed title: Provide easier mechanism to flush server cache.
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

This is a nice solution. Note that it won't allow invalidating individual files 
like some folks have been requesting.

Original comment by sligocki@google.com on 5 Apr 2012 at 6:34

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

I don't have a good solution for invalidating individual files but I think we 
can do it at VirtualHost granularity.

Original comment by jmara...@google.com on 6 Apr 2012 at 11:44

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

if i may offer a suggestion:

would it be an idea to prepend the cache keys with an internal 'version' number 
just for this? 
so initially, looking up resource '/js/jquery.js' would be done with the 
original url, until a user invalidates it somehow (maybe something like 'curl 
-X PURGE http//foo.bar/js/jquery.js').

these purges will need to be stored somewhere. after that, lookups that match 
these purge expression(s) would be rewritten to 
'v:1|http//foo.bar/js/jquery.js'. 
not exactly a flush, but the affected cache key(s) would never be hit again.
after the purge, a lazy action should probably be started to get rid of the 
'version(s)', so they don't have to be remembered forever. after that finishes, 
things proceed 'as normal'

maybe even wildcards could be done this way (PURGE http//foo.bar/js/*)

I only had a look at lru_cache.cc, so i might be completely on the wrong foot 
here :)

Original comment by osch...@gmail.com on 27 Apr 2012 at 9:48

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Progress: I have a change under code-review which implements cache-flushing at 
a VirtualHost granularity.

Oschaaf, this is an interesting idea.  There are a 2 complexities in our system.

  1. In Apache we use a 2-level cache.  L1 is a per-child-process in-memory
     cache (you saw it in lru_cache.cc).  L2 is a file-based cache
     (file_cache.cc).  So requesting a flush via an HTTP request will have to
     hit the L1s of all the Apache child processes.

  2. As for the per-file flushes: this is not a problem for flushing our HTTP cache
     (logical usage of our physical L1/L2 cache).  However, our meta-data cache is
     more complicated because we cache the mapping of (for example) multiple CSS
     files to the filename they combine into.  So it's hard to identify all the metadata
     cache-keys that incorporate a .css file.

Neither of these are show-stopper issues and we may eventually have 
per-resource or wildcard-based flush capability.  We've had some architectures 
in mind similar to what you described.


But for now we'll just support a total VirtualHost-scoped flush based on 
timestamp.  The L1/L2 cache problem is resolved by having each process 
periodically poll for the cache-flush file.

Original comment by jmara...@google.com on 27 Apr 2012 at 12:48

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Ok, understood. In my implementation, i currently have no real virtual host 
concept (*maybe* only mapping rules for incoming/outgoing urls in a reverse 
proxy implementation, but it could be transparant proxying too). and its always 
but a single process. So my problem is, that in my current implementation your 
plan would result in a global flush. I'll read up some more on your codebase, 
and see if i can roll my own implementationt :) thanks for the heads up.

Original comment by osch...@gmail.com on 28 Apr 2012 at 8:25

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Fixed in http://code.google.com/p/modpagespeed/source/detail?r=1544

With this change, doing:

  touch CACHE_DIRECTORY/cache.flush

will flush the mod_pagespeed cache within 5 seconds.


I'm leaving this bug open for now as there are some configuration tweaks I want 
to follow-up on.

Original comment by jmara...@google.com on 2 May 2012 at 5:27

  • Added labels: release-note
@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

@jmaranz: cool! looking at the diff from your commit, it seems like I would be 
able to use RewriteOptions::UpdateCacheInvalidationTimestampMs selectively when 
I match a purge expression, can't i? that time stamp is now seeding the hashes 
used? if so, that would allow me to add something similar for purging at the 
domain level in my port too.

another question about this:
is it a stupid idea,to insert a pre render filter that updates all resource 
url's with a generation/version in their querystrings, to accomplish this? in 
my fetcher, i would remove this version before querying the origins. that way, 
i would be able to match purge expressions in the pre render filter, and 
rewrite resource urls selectively. i known this is very rough around the edges, 
but it seems very simple to implement to me, but i wonder if i am missing any 
implications on this (other then creating slightly longer urls in rewritten 
html). 

Original comment by osch...@gmail.com on 3 May 2012 at 10:06

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

Sorry Oschaaf, never replied to your comment.  You may want to take this to the 
discussion group https://groups.google.com/group/mod-pagespeed-discuss?pli=1 as 
I'm about to do the last changes needed to close this bug and I won't look at 
it after that :)

The suggestion you are making sounds plausible but I haven't had much chance to 
think about it deeply.  I'm not too worried about the complexity of flushing 
the cache of origin resources.  The main thing that's complicated is flushing 
the metadata cache, and in particular, flushing the metadata cache entries for 
Combiners (e.g. combine_css) where one of the files got flushed.

Original comment by jmara...@google.com on 17 May 2012 at 8:41

@GoogleCodeExporter

This comment has been minimized.

GoogleCodeExporter commented Apr 6, 2015

New options for controlling cache-flush polling & filename.

http://code.google.com/p/modpagespeed/source/detail?r=1581

Original comment by jmara...@google.com on 18 May 2012 at 2:37

  • Changed state: Fixed
  • Added labels: Milestone-v22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment