Add consistent caching layer for templates #1489

maruel · 2017-03-02T23:55:44Z

Problem statement

When a statement root occurs in a Caddyfile, all files in this directory are opened at each request, making it much slower than necessary.
When using templates, all the files are cached on process startup and modification on the files are not detected.

This is different from issue #10 which is to cache external data and issue #390 that wasn't focused.

Goals

Reduce file I/O and GC pressure by keeping frequently accessed static files on disk in memory in a cache.
Fix templates caching mechanism that doesn't detect when a file change.

Action items

Create a http.FileSystem compatible system that does implicit caching via a package like https://github.com/coocood/freecache.
- Detect when files are updated and clear the cache entries accordingly using fsnotify.
Leverage this caching layer in httpserver.Context.Root.
Make sure all all I/O pass through this, including templates. caddyhttp/templates/templates.go
- Have a way to invalidate the compiled templates when the underlying file changed.

It is possible I misunderstood the design and that the AIs are incorrect.

The text was updated successfully, but these errors were encountered:

mholt · 2017-03-03T04:11:36Z

Hey @maruel! Thanks for your feedback/proposal. I want to make sure I understand it completely.

When a statement root occurs in a Caddyfile, all files in this directory are opened at each request, making it much slower than necessary.

This is true, although right now I'm leaning on OS/FS optimizations to make load times better. I think Go uses Unix sendfile, doesn't it?

When using templates, all the files are cached on process startup and modification on the files are not detected.

This... I don't believe this is true. Template files are loaded and parsed at request time, not startup; I was using templates yesterday and saving them and reloading the page was working great for me... 😶 Here's the relevant code: https://github.com/mholt/caddy/blob/0a0d2cc1cf12d58c5e38680689c78f18044288e6/caddyhttp/templates/templates.go#L48-L60

So while I can see why the first goal will be a valuable feature in Caddy, does this render the second goal obsolete?

Anyway, if I understand the rest of your proposal right, we wouldn't need to touch templates (?) but just add a caching mechanism to the static file server. This could be configured (enabled) with a directive in the Caddyfile.

Questions:

Is cache eviction determined solely on file modification? Do we cap at max size or age? I'm not a caching guru so I defer to the audience. :)
If this does touch more than just the static file server, then maybe a pre-requirement of this feature is something I've kind of wanted for a long time, which is a standard, httpserver-provided utility function that loads a file from disk. Any middleware that wants to load a file from disk hits this function instead of doing it each themselves. This would be where we put the caching code.

(This reminds me I want to propose a proxy caching feature as well. But that's for another issue.)

maruel · 2017-03-03T15:48:49Z

Hi @mholt !

Clarifying a few things:

I ran strace on a trivial http server implementation and was a bit confused about the output. So you are right, irrelevant for static files, at least on non-Windows. So let's ignore this for now.
markdown and templates have different caching semantic. templates reparse on the fly, markdown loads all the templates on process startup. This is due to caddyhttp/markdown/setup.go#L105.

The current template caching behavior is surprising IMHO so I renamed the issue to focus on the templates.

Is cache eviction determined solely on file modification? Do we cap at max size or age?

I'm thinking of fsnotify, so it is based on OS notifications of file modification, no polling involved.
Using a package like freecache to do the actual cache management. To be fair, I don't think most users have multi-megabytes templates so memory pressure would be rare.
The compiled templates would be cached, not the raw template string, to increase efficiency.

maruel · 2017-03-04T12:59:41Z

I migrated from markdown to using hugo directly, which means I'm not personally blocked on this anymore: I don't need to restart caddy when I modify the templates. Keeping the issue open as this is still a serious deficiency in the markdown module IMHO.

mholt · 2017-03-04T19:58:57Z

Thanks for clarifying. I'm beginning to agree. markdown should probably parse the templates at request-time (and then cache them, as you're proposing; that should help with performance).

I'm booked up for a while but anyone is welcome to tackle this! If the templates and markdown directives share any logic, try to make the code common between them. Maybe caching is something that the httpserver provides for middleware.

mholt · 2019-10-11T00:18:56Z

@maruel Good news, we've got an official cache middleware started here in the v2 branch (Caddy 2 is currently in beta): https://github.com/caddyserver/caddy/tree/v2/modules/caddyhttp/httpcache

It is suitable for any HTTP response, not just those from templates.

It's a working, simple PoC right now but I'd love some help getting it finished. Can you give it a try and we can work on it together?

maruel · 2019-10-12T13:11:27Z

Hi! What would be the best avenue to discuss the design choices? I don't promise anything else, but at the very least a discussion about the design trade offs would be good.

mholt · 2019-10-12T18:07:29Z

@maruel Either here, or if you want some tighter feedback, I can invite you to our dev Slack. As long as we summarize what we decide here it should be fine. Want me to send an email to the address on your GitHub profile?

maruel · 2019-10-19T17:04:56Z

Observations:

I recommend against encoding/gob and instead use something not self-describing and using flat memory, like flatbuffer, capnproto, etc. I don't have a preference on which.
- https://play.golang.org/p/G3EgQIDCABG shows that a trivial header is 153, which is a relatively large overhead.
Think about keeping the data pre-compressed, probably gzip and explicitly handle this to not break clients not supporting the same compression. The advantages are:
- Data is only compressed once for HTTP1 clients supporting gzip.
- Data takes less space the cache.
- Decompression is much faster than compression, so it's not a big deal to decompress for HTTP2 client to recompress on the fly.
Cache HEAD too.

Since it's fairly involved to get everything right, maybe file separate github issues?

mholt · 2019-10-19T17:23:43Z

@maruel

I recommend against encoding/gob and instead use something not self-describing and using flat memory, like flatbuffer, capnproto, etc. I don't have a preference on which.
https://play.golang.org/p/G3EgQIDCABG shows that a trivial header is 153, which is a relatively large overhead.

Good point, I'm totally in favor of this. I just whipped it up real quick as a PoC and we need something better. Would anyone have a chance to do a quick comparison of which serialization is most efficient? We don't need type information embedded into the serialization, necessarily, since we have a very specific case of just a map of strings.

Honestly, we could probably even roll our own VERY simple serialization format that just concatenates each header field. It's for internal use only, so a general-purpose serialization might be overkill.

Think about keeping the data pre-compressed, probably gzip and explicitly handle this to not break clients not supporting the same compression.

One advantage we have here is that the cache can be added anywhere in the middleware chain. We can, for instance, add it before the gzip handler, so by the time the response comes back to the cache handler to be added to the cache, it is already compressed. :) So, this might already be done by default due to Caddy 2's architecture!

Cache HEAD too.

Agreed, and specifically what is allowed to be cached needs to be very finely controlled and configurable, so we'll have to expand on this.

Since it's fairly involved to get everything right, maybe file separate github issues?

Yes, definitely. Feel free to split this into a new issue to continue discussion on whatever part you think we should work on first!

mholt · 2020-04-01T22:59:19Z

Moving discussion to new repo dedicated for the caching layer: caddyserver/cache-handler#1

maruel changed the title ~~Add low level caching layer for all files in root~~ Add consistent caching layer for templates Mar 3, 2017

mholt added the feature ⚙️ New feature or request label Mar 3, 2017

mholt mentioned this issue Apr 18, 2017

markdown template auto reload for local develop #1445

Closed

mholt added this to the 2.0 milestone May 9, 2019

mholt mentioned this issue Apr 1, 2020

Design/spec discussion caddyserver/cache-handler#1

Open

mholt closed this as completed Apr 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add consistent caching layer for templates #1489

Add consistent caching layer for templates #1489

maruel commented Mar 2, 2017

mholt commented Mar 3, 2017

maruel commented Mar 3, 2017

maruel commented Mar 4, 2017

mholt commented Mar 4, 2017

mholt commented Oct 11, 2019 •

edited

Loading

maruel commented Oct 12, 2019

mholt commented Oct 12, 2019

maruel commented Oct 19, 2019

mholt commented Oct 19, 2019 •

edited

Loading

mholt commented Apr 1, 2020

Add consistent caching layer for templates #1489

Add consistent caching layer for templates #1489

Comments

maruel commented Mar 2, 2017

Problem statement

Goals

Action items

mholt commented Mar 3, 2017

maruel commented Mar 3, 2017

maruel commented Mar 4, 2017

mholt commented Mar 4, 2017

mholt commented Oct 11, 2019 • edited Loading

maruel commented Oct 12, 2019

mholt commented Oct 12, 2019

maruel commented Oct 19, 2019

mholt commented Oct 19, 2019 • edited Loading

mholt commented Apr 1, 2020

mholt commented Oct 11, 2019 •

edited

Loading

mholt commented Oct 19, 2019 •

edited

Loading