Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadFromFileCacheTtlMs does not control how often the file-system is re-checked. #1626

Open
dvershinin opened this Issue Feb 22, 2019 · 7 comments

Comments

Projects
None yet
5 participants
@dvershinin
Copy link

dvershinin commented Feb 22, 2019

There are several issues actually I want to mention here.

First. One may assume that LoadFromFileCacheTtlMs allows to have ngx_pagespeed only check files for changes for a configured number of seconds. However this isn't this case as it only controls Cache-Control and Expires headers sent out.

ngx_pagespeed seems to check files for changes upon every request which has a potential for improvement. We want to stop those mtime checks which happen on every request, as this would reduce I/O quite a lot on busy sites. Can we have a LoadFromFileCacheStatMs, please ???

Second, I found LoadFromFileCacheTtlMs to behave quite strangely as it is (1.13.35.2-0). Mainly in my testing:

  • With pagespeed LoadFromFileCacheTtlMs < 1000, the first request will result in cache-control: public, max-age=315360000, s-maxage=10, while second, and any further request to optimized cache extended resource with .pagespeed prefix will yield cache-control: max-age=0,no-cache!

  • With pagespeed LoadFromFileCacheTtlMs >= 1000, the first request will result in cache-control max-age=1 (or 3, or few seconds otherwise), while second, and any further request to optimized cache extended resource with .pagespeed prefix will yield cache-control:max-age=31536000

So multiple issues there:

  • Values lower than 1000 result in no cacheability at all (I have no idea why that number is a breaking point)
  • Values over 1000 result in cacheability, but there is no actual control of the value, since 31536000 seems to be hardcoded
@oschaaf

This comment has been minimized.

Copy link
Member

oschaaf commented Feb 22, 2019

I think the gotcha here is that LoadFromFileCacheTtlMs controls cache TTL for the module's internal caching system, and is not controlling cache expiry directives at the http level when optimized resources are send over http..
What I think you are observing at the http level is that the output is either a cache-extended optimized resource with a 1 year TTL, or an intermediary output from the module where it didn't have an optimized resource ready yet. I think that retrying a couple of times will get you the optimized output eventually when that happens.

@dvershinin

This comment has been minimized.

Copy link
Author

dvershinin commented Feb 22, 2019

Well, that's what I thought:

I think the gotcha here is that LoadFromFileCacheTtlMs controls cache TTL for the module's internal caching system

But my test environment is an idle system with a single HTML file and single CSS file. I can definitely see resource already optimized, when they are.

However, I can see that the mtime check happens any time I modify contents of the CSS file, as this reflects in a different hash of the .pagespeed CSS URL on next reload. So it doesn't seem like LoadFromFileCacheTtlMs plays a role in e.g. leaving the file completely alone for some time.

With LoadFromFileCacheTtlMs 999 (or anything below 1000 for that matter), not only all subsequent reloads result in cache-control: max-age=0,no-cache on the .pagespeed resource (unless you assume I should wait 999 seconds to see a difference), but also the resource is not optimized (no minification), only cache extended.

So maybe, just maybe :) the extend cache filter is somehow kicking in always whereas others respect the LoadFromFileCacheTtlMs before they do their stuff.

@jmarantz

This comment has been minimized.

Copy link
Contributor

jmarantz commented Feb 23, 2019

TL;DR: only use LoadFromFile on a local physical disk where stat() is cheap. Never use LoadFromFile on a mounted file system.

RE stat() overhead per-request: your observation is spot-on, and reflects the intended design, and does result in a stat() call on each resource every time it is referenced in an HTML file.

This is intended as an alternative to using HTTP-fetching and a file-cache. It avoids the HTTP fetch (and also side-steps any issues you might have with HTTPS fetching). A tradeoff is that it doesn't have access to the HTTP origin headers for your assets, so it doesn't know how often to re-check to see if the origin asset has changed. This also means that changes to assets take place immediately; they don't need to expire out of cache.

If stat() takes along time (e.g. it's a mounted system) then definitely don't use LoadFromFile; use HTTP fetching so we can get cache TTLs and periodic checking of how up-to-date the contents are, based on the origin TTL that you control per normal HTTP caching headers.

If you say LoadFromFileCacheTtlMs 999 you are saying the origin assets are valid for less than a second. This is not a scenario PageSpeed was designed for, but I admit the handling could be better -- e.g. just do nothing with the asset.

@jmarantz jmarantz changed the title LoadFromFileCacheTtlMs is quite broken LoadFromFileCacheTtlMs does not control how often the file-system is re-checked. Feb 23, 2019

@luison

This comment has been minimized.

Copy link

luison commented Mar 7, 2019

Hi. Sorry to jump into this specific query but I am trying to understand the updating system of sources of pagespeed when using the extended_cache filter related to the LoadFromFileCacheTtlMs

I posted a question on the google group (https://goo.gl/SgCWBj) but I'll try to briefly ask my question here as I am about to give up on it.

We are trying to improve a nginx cache proxy in front of various apache servers on containers. Our setup was intercepting static content... (css, js, images, etc) extending cache headers, proxying and caching:

client --> nginx proxy cache for css, js, images.. --> backend apache
client --> nginx proxy (no cache) for the rest --> backend apache

Our intention now is to add pagespeed in order to improve in general and particularly css and JS further by concatenating (css + js) and properly versioning to extend caches which I understand is exactly what extended_cache filter does. (A list of our active filters bellow.)

Our doubts/issues are coming from some css and JS after treated by pagespeed do not get updated after a change and remain staled. This happens even if we've:

  1. forced the expires headers of the apache backend to 30 seconds. Would pagespeed use this value at all as a TTL to recheck the backend or will it use the one returned by the nginx-cache?
    2.- forced the nginx-proxy-cache for static files to bypass the cache and set it to no-store for testing. Again... is pagespeed using this or the backend expires?
    3.- cleared the cache for the specific file (ie /ass/skins/def/css/bootstrap.min.css+main.gis.css.pagespeed.cc.7FZQ-V4k-L.css) via the /pagespeed_admin. Only when clearing the whole pagespeed cache... the file gets updated

So a file as /ass/skins/def/css/bootstrap.min.css+main.gis.css.pagespeed.cc.7FZQ-V4k-L.css remains so after a change on the backend whatsoever.

This is what that root reports in the cache:

Metadata cache key:rname/cc_A5CWJ1Ij7nG9n_XUZo0i/t/vozrWfLE89IPR4eF0bW@
cache_ok:true
can_revalidate:false
partitions:partition {
  optimizable: true
  url: "https://xxxxxx/ass/skins/def/css/bootstrap.min.css+main.gis.css.pagespeed.cc.7FZQ-V4k-L.css"
  input {
    index: 0
    type: CACHED
    last_modified_time_ms: 1547041050000
    expiration_time_ms: 1554586471000
    date_ms: 1551994471000
    input_content_hash: "7Du1KgDhdqcYHUVN_66iG"
    url: "https://xxxxxxx/ass/skins/def/css/bootstrap.min.css"
  }
  input {
    index: 1
    type: CACHED
    last_modified_time_ms: 1551993894000
    expiration_time_ms: 1554586471000
    date_ms: 1551994471000
    input_content_hash: "UGK37FuBTQjWD1AQi_GCT"
    disable_further_processing: true
    url: "https://xxxxxxxxxxxxxxxxxxxxxxxxxxx/ass/skins/def/css/main.gis.css"
  }
}

Even after clicking "delete" on that, no change. Only once I clear the whole pagespeed cache through the backend I get an updated version.

I'm probably not understanding something:

  • do I require additional setup if using nginx proxy cache as in doc: https://www.modpagespeed.com/doc/downstream-caching.html considering this scenario. I am happy with html not being cached.
  • does pagespeed "read" files from nginx-cache-proxy or is it aware of the backend upstream too?
  • Would setting a lower value to LoadFromFileCacheTtlMs change anything? I understand default is 5 minutes if "no expires" header are set. What about if they are? Does the TTL become the Expires of the source brought by http from the backend or as I guess this value is only good for "local" disk access reads?
  • should I just make nginx act as non-cache proxy and let pagespeed deal completely with that?

Thanks.

Active FIlters
ah Add Head
ai Add Instrumentation
cc Combine Css
jc Combine Javascript
gp Convert Gif to Png
jp Convert Jpeg to Progressive
jw Convert Jpeg To Webp
mc Convert Meta Tags
pj Convert Png to Jpeg
ws When converting images to WebP, prefer lossless conversions
ec Cache Extend Css
ei Cache Extend Images
es Cache Extend Scripts
fc Fallback Rewrite Css
if Flatten CSS Imports
hw Flushes html
ci Inline Css
ii Inline Images
il Inline @import to Link
ji Inline Javascript
idp Insert DNS Prefetch
js Jpeg Subsampling
cj Move Css Above Scripts
pr Prioritize Critical Css
rj Recompress Jpeg
rp Recompress Png
rw Recompress Webp
ri Resize Images
cf Rewrite Css
jm Rewrite External Javascript
jj Rewrite Inline Javascript
cu Rewrite Style Attributes With Url
cp Strip Image Color Profiles
md Strip Image Meta Data

@Lofesa

This comment has been minimized.

Copy link

Lofesa commented Mar 10, 2019

Hi @luison
I think i´m unaware to respond all these question cause is a scene i have not used.
But I try:
Where is pagespeed installed? in the apache backend or in the nginx proxy?
I think pagespeed "read" the ttl header (cache-control header) set in the webserver is running.
Origin resource ttl is used by pagespeed to set ttl of the optimized resources in pagespeed cache, when pagespeed fecht the resource via http. When the resource is loaded from file implies don´t have ttl headers (in fact load from file don´t have any header) so you must configure a default ttl time for resources loaded from file.

Deleting the metadata cache don´t delete the optimized resource from the cache.

@luison

This comment has been minimized.

Copy link

luison commented Mar 10, 2019

Thanks @Lofesa, pagespeed is installed on the nginx proxy_cache. Thanks for the info regarding metadata but the issue remains. Actually even removing all cache services now and just using nginx as a proxy to our apache backend, issue remains, so trying to figure what else might be wrong in our setup.

@Lofesa

This comment has been minimized.

Copy link

Lofesa commented Mar 11, 2019

Hi
If you have running the module in the nginx cache I think you can´t use LoadFromFile, need to fecht resources by http fecht, and according to the doc you must exclude optimized resources (these that have .pagespeed.xx.hash.ext in the url) from the proxy-cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.