Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add consistent caching layer for templates #1489

Closed
maruel opened this issue Mar 2, 2017 · 10 comments
Closed

Add consistent caching layer for templates #1489

maruel opened this issue Mar 2, 2017 · 10 comments
Labels
feature ⚙️ New feature or request
Milestone

Comments

@maruel
Copy link

maruel commented Mar 2, 2017

Problem statement

  • When a statement root occurs in a Caddyfile, all files in this directory are opened at each request, making it much slower than necessary.
  • When using templates, all the files are cached on process startup and modification on the files are not detected.

This is different from issue #10 which is to cache external data and issue #390 that wasn't focused.

Goals

  • Reduce file I/O and GC pressure by keeping frequently accessed static files on disk in memory in a cache.
  • Fix templates caching mechanism that doesn't detect when a file change.

Action items

It is possible I misunderstood the design and that the AIs are incorrect.

@mholt
Copy link
Member

mholt commented Mar 3, 2017

Hey @maruel! Thanks for your feedback/proposal. I want to make sure I understand it completely.

When a statement root occurs in a Caddyfile, all files in this directory are opened at each request, making it much slower than necessary.

This is true, although right now I'm leaning on OS/FS optimizations to make load times better. I think Go uses Unix sendfile, doesn't it?

When using templates, all the files are cached on process startup and modification on the files are not detected.

This... I don't believe this is true. Template files are loaded and parsed at request time, not startup; I was using templates yesterday and saving them and reloading the page was working great for me... 😶 Here's the relevant code: https://github.com/mholt/caddy/blob/0a0d2cc1cf12d58c5e38680689c78f18044288e6/caddyhttp/templates/templates.go#L48-L60

So while I can see why the first goal will be a valuable feature in Caddy, does this render the second goal obsolete?

Anyway, if I understand the rest of your proposal right, we wouldn't need to touch templates (?) but just add a caching mechanism to the static file server. This could be configured (enabled) with a directive in the Caddyfile.

Questions:

  • Is cache eviction determined solely on file modification? Do we cap at max size or age? I'm not a caching guru so I defer to the audience. :)
  • If this does touch more than just the static file server, then maybe a pre-requirement of this feature is something I've kind of wanted for a long time, which is a standard, httpserver-provided utility function that loads a file from disk. Any middleware that wants to load a file from disk hits this function instead of doing it each themselves. This would be where we put the caching code.

(This reminds me I want to propose a proxy caching feature as well. But that's for another issue.)

@maruel maruel changed the title Add low level caching layer for all files in root Add consistent caching layer for templates Mar 3, 2017
@maruel
Copy link
Author

maruel commented Mar 3, 2017

Hi @mholt !

Clarifying a few things:

  • I ran strace on a trivial http server implementation and was a bit confused about the output. So you are right, irrelevant for static files, at least on non-Windows. So let's ignore this for now.
  • markdown and templates have different caching semantic. templates reparse on the fly, markdown loads all the templates on process startup. This is due to caddyhttp/markdown/setup.go#L105.

The current template caching behavior is surprising IMHO so I renamed the issue to focus on the templates.

Is cache eviction determined solely on file modification? Do we cap at max size or age?

  • I'm thinking of fsnotify, so it is based on OS notifications of file modification, no polling involved.
  • Using a package like freecache to do the actual cache management. To be fair, I don't think most users have multi-megabytes templates so memory pressure would be rare.
  • The compiled templates would be cached, not the raw template string, to increase efficiency.

@mholt mholt added the feature ⚙️ New feature or request label Mar 3, 2017
@maruel
Copy link
Author

maruel commented Mar 4, 2017

I migrated from markdown to using hugo directly, which means I'm not personally blocked on this anymore: I don't need to restart caddy when I modify the templates. Keeping the issue open as this is still a serious deficiency in the markdown module IMHO.

@mholt
Copy link
Member

mholt commented Mar 4, 2017

Thanks for clarifying. I'm beginning to agree. markdown should probably parse the templates at request-time (and then cache them, as you're proposing; that should help with performance).

I'm booked up for a while but anyone is welcome to tackle this! If the templates and markdown directives share any logic, try to make the code common between them. Maybe caching is something that the httpserver provides for middleware.

@mholt
Copy link
Member

mholt commented Oct 11, 2019

@maruel Good news, we've got an official cache middleware started here in the v2 branch (Caddy 2 is currently in beta): https://github.com/caddyserver/caddy/tree/v2/modules/caddyhttp/httpcache

It is suitable for any HTTP response, not just those from templates.

It's a working, simple PoC right now but I'd love some help getting it finished. Can you give it a try and we can work on it together?

@maruel
Copy link
Author

maruel commented Oct 12, 2019

Hi! What would be the best avenue to discuss the design choices? I don't promise anything else, but at the very least a discussion about the design trade offs would be good.

@mholt
Copy link
Member

mholt commented Oct 12, 2019

@maruel Either here, or if you want some tighter feedback, I can invite you to our dev Slack. As long as we summarize what we decide here it should be fine. Want me to send an email to the address on your GitHub profile?

@maruel
Copy link
Author

maruel commented Oct 19, 2019

Observations:

  • I recommend against encoding/gob and instead use something not self-describing and using flat memory, like flatbuffer, capnproto, etc. I don't have a preference on which.
  • Think about keeping the data pre-compressed, probably gzip and explicitly handle this to not break clients not supporting the same compression. The advantages are:
    • Data is only compressed once for HTTP1 clients supporting gzip.
    • Data takes less space the cache.
    • Decompression is much faster than compression, so it's not a big deal to decompress for HTTP2 client to recompress on the fly.
  • Cache HEAD too.

Since it's fairly involved to get everything right, maybe file separate github issues?

@mholt
Copy link
Member

mholt commented Oct 19, 2019

@maruel

I recommend against encoding/gob and instead use something not self-describing and using flat memory, like flatbuffer, capnproto, etc. I don't have a preference on which.
https://play.golang.org/p/G3EgQIDCABG shows that a trivial header is 153, which is a relatively large overhead.

Good point, I'm totally in favor of this. I just whipped it up real quick as a PoC and we need something better. Would anyone have a chance to do a quick comparison of which serialization is most efficient? We don't need type information embedded into the serialization, necessarily, since we have a very specific case of just a map of strings.

Honestly, we could probably even roll our own VERY simple serialization format that just concatenates each header field. It's for internal use only, so a general-purpose serialization might be overkill.

Think about keeping the data pre-compressed, probably gzip and explicitly handle this to not break clients not supporting the same compression.

One advantage we have here is that the cache can be added anywhere in the middleware chain. We can, for instance, add it before the gzip handler, so by the time the response comes back to the cache handler to be added to the cache, it is already compressed. :) So, this might already be done by default due to Caddy 2's architecture!

Cache HEAD too.

Agreed, and specifically what is allowed to be cached needs to be very finely controlled and configurable, so we'll have to expand on this.

Since it's fairly involved to get everything right, maybe file separate github issues?

Yes, definitely. Feel free to split this into a new issue to continue discussion on whatever part you think we should work on first!

@mholt
Copy link
Member

mholt commented Apr 1, 2020

Moving discussion to new repo dedicated for the caching layer: caddyserver/cache-handler#1

@mholt mholt closed this as completed Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants