Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching approach for nginx-proxy #54

Closed
gingerlime opened this issue Aug 17, 2019 · 12 comments
Closed

caching approach for nginx-proxy #54

gingerlime opened this issue Aug 17, 2019 · 12 comments
Labels

Comments

@gingerlime
Copy link
Contributor

The nginx-proxy image is used for proxying thumbor and primarily for caching (+ easier handling of CORS perhaps). It's built on top of https://github.com/jwilder/nginx-proxy with the caching parts based on APSL's original implementation.

It serves well, but recently I'm having second thoughts if this is the best approach. Reasons:

Those aren't the core issues however in my opinion, but rather a symptom. Why?

The caching is relying on thumbor to store cached images in a results storage folder, and then fetch them if they exist using try_files. This works well, but has some limitations. For example:

  • content-type headers are unknown just from looking at the cached file. We essentially "guess" them based on the filename (e.g. if it has ".png" in the filename then it's an image/png), or the filter... But if we serve a file that has no extension and no filter (not very common, but can happen), then we just can't serve the right content-type header
  • WEBP again is an edge-case and depends on browser headers + result storage
  • there's quite a bit of "code" (nginx configuration) to deal with this. For example calculating the hash of the request via JS to find the cached file, plus all the content-type handling etc
  • In order to work, nginx-proxy and thumbor must share the "data" folder where result storage is kept

It feels a bit like re-inventing the wheel here. After all, nginx has built-in proxy caching functionality, so why not rely on it?

I played around with it a bit, and I think it can be a simpler, more elegant and just as performant (or perhaps even more) if we simply use the built-in proxy_cache directives.

Advantages:

  • No need to save results storage in thumbor, caching is independent
  • caching disk space / retention etc are also independent
  • no need to share a volume or "find" the results cache files, nginx takes care of it by itself
  • content-type header cached "for free". Nginx "remembers" it as far as I can tell (TODO: check what happens with WEBP, but I imagine it would work)
  • less modification to the nginx.tmpl file compared to the base

Disadvantages:

  • Need for custom variables for proxy settings (e.g. cache storage size, expiration, etc), or let the users "bring their own" proxy.conf file? (but that won't be very useful)
  • breaks backwards-compatibility if we simply change how nginx-proxy works... So maybe introduce a new proxy? then there are two, so which one to pick?? the transition might be tricky... need to think about it
@gingerlime
Copy link
Contributor Author

@kkopachev @limuhob @korneevm thoughts? (sorry for the long essay, but was trying to explain my thinking about it, let me know if something's not clear)

gingerlime pushed a commit that referenced this issue Aug 18, 2019
* proof-of-concept (WIP) using internal nginx proxy caching
  instead of relying on thumbor's result storage and try_files
@gingerlime
Copy link
Contributor Author

I created a proof-of-concept PR #55 (some tests are failing, but that's because they are based on the current cache implementation)

@limuhob
Copy link
Contributor

limuhob commented Aug 18, 2019

I think It is a good idea. I don't have experiences with nginx proxy caching so I don't know limitation of this approach but I will try some experiments with it.

@limuhob
Copy link
Contributor

limuhob commented Aug 20, 2019

I seems to me that It works well. A potential disadvantage for someone could be extra disk storage which takes nginx cache. Files are "duplicated" in nginx cache and in thumbor result storage.

@gingerlime
Copy link
Contributor Author

Thanks @limuhob. I think if you opt to use nginx caching, then you should turn off your results storage actually. Like I did on the sample docker-compose recipe https://github.com/MinimalCompact/thumbor/pull/55/files#diff-cc2321b5e66d16f3f7bac0e5e3195bcaR20

@limuhob
Copy link
Contributor

limuhob commented Aug 20, 2019

Yes, you are right!

@gingerlime
Copy link
Contributor Author

This is also part of the "backwards breaking" changes though if we replace nginx-proxy with this version though... Because if people keep storing the results in the results_storage and then "upgrade" to the new version of nginx-proxy, then they're wasting 2x disk space unnecessarily...

I wonder what's the best way to communicate this change if we go down this path. Perhaps renaming it to something slightly different (maybe thumbor-nginx-cache or something?)...

Curious to hear what @kkopachev and @korneevm think as well :)

@korneevm
Copy link
Contributor

Hi, @gingerlime!

It seems like quite an elegant solution to me. I'll try to test it on my installation later this week.

@gingerlime
Copy link
Contributor Author

Thank you @korneevm. I really appreciate it. Let me know if you have any questions.

@kkopachev
Copy link
Collaborator

I think this is the way to go. Nginx had to know too much on how thumbor was storing files in result storage.
New container name seems good solution.

@gingerlime
Copy link
Contributor Author

gingerlime commented Aug 24, 2019

Thanks everyone. Glad to hear this is a good approach. I'll continue working on it. I still need to find some simple ways to configure caching (for example: be able to easily set cache diskspace max size, expiry time with environment variables etc, and also to allow to easily overwrite with a proxy.conf if someone wants).

@gingerlime
Copy link
Contributor Author

I updated PR #55 to create a new minimalcompact/thumbor-nginx-proxy-cache image, and also add a deprecation warning to the old minimalcompact/thumbor-nginx-proxy image... Would be great if if can take a look at the PR and let me know if I missed anything or if you think it's ready to merge :)

gingerlime pushed a commit that referenced this issue Sep 11, 2019
* refs #54 nginx-proxy caching

* proof-of-concept (WIP) using internal nginx proxy caching
  instead of relying on thumbor's result storage and try_files
* adding new image: thumbor-nginx-proxy-cache
* the new nginx-proxy-cache will replace nginx-proxy with built-in proxy
  caching by nginx instead of mapping to thumbor's data folder
* refreshed nginx.tmpl from upstream (jwilder/nginx-proxy)
* updated template to add thumbor + proxy caching with minimal control
  using environment variables (for further customization, users can
  supply a proxy.conf file)
* updated tests: added proxy caching tests + removed some tests that are
  unnecessary or difficult to do with the built-in proxy caching
  (for example, overriding the cached data is somehow detected by nginx)
* updated recipes to reflect the new setup (no need for result storage
  caching in thumbor, and no data volume)
* revert changes to (deprecated) nginx-proxy ; adding deprecation warning
* added a note to README
* fix wrong name in deprecated
* updated default cache size (to 10g) and inactive time (to 48h)
* added README for nginx-proxy-cache
* updated recipes to add a persisted cache folder
* changed environment variables to distinguish between max_size (on disk) and memory size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants