Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please implement caching #15

Closed
kseistrup opened this issue Feb 15, 2017 · 8 comments
Closed

Please implement caching #15

kseistrup opened this issue Feb 15, 2017 · 8 comments

Comments

@kseistrup
Copy link

Someone, please tell me why on Earth Camo decides to fetch the same effing file 30 times (and counting, I should say!) in the matter of just 43 seconds?

15/Feb/2017:09:12:10 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	#go-camo
15/Feb/2017:09:12:11 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:12 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:13 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:14 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:15 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:17 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:18 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:19 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:20 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:21 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:27 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:28 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:29 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:30 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:30 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:32 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:33 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:34 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:35 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:37 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:38 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:39 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:41 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:42 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:45 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:46 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:50 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:51 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo
15/Feb/2017:09:12:53 +0100	176.9.43.119	https://${hostname}/0d61156a-f0a4-5383-9ae8-b72430f3fa50.png	GET HTTP/1.1 200 92320	go-camo

Ever heard about If-Modified-Since? Or Etag?

Seriously guys, please implement some caching!

@dropwhile
Copy link
Member

It is always nice to hear that someone is using Go-Camo!

While I am sorry you are not having a great experience, your request for assistance from the solo developer of a free project could maybe be a bit... nicer?

On to your issue.

From the README:

Note that it is recommended to front Go-Camo with a CDN when possible.

Go-Camo indeed does no caching on its own. This was intentional, in order to mirror the behavior of the atmos/camo project. Further, client request headers are indeed passed on to the server. In this case, if-none-match (etag) and if-modified-since.

However, Etag is not currently being passed through in the response. Initially it was omitted in an attempt to maintain end user privacy as Etags are often used for tracking. However, the omission does seem inconsistent (it also appears atmos/camo also added it at one point), and so I will consider it a bug and fix it.

As to the repeated requests you are seeing, presumably from the same ip address, you likely have a visitor (or bot/scraper) that is either purposefully, accidentally, or erroneously, requesting the same content over and over.

On to possible remedies. If you do desire a more integrated caching caching solution, you could:

  • Put a cache/CDN in front of Go-Camo
  • Tell Go-Camo to make requests through a caching http proxy like squid -- the Go stdlib http client library should honor the HTTP_PROXY environment variable. Do note that this is not a tested configuration.
  • Try alternative software. Perhaps willnorris/imageproxy or atmos/camo (though atmos/camo does not do any caching either, so presumably it wouldn't fit your needs either).
  • Fork this project and add caching functionality for your own uses.

@kseistrup
Copy link
Author

Thanks for your lengthy reply. I am not a podmin who has set up go-camo, I am just a regular user that has observed that the same pictures are being fetched from my home server over and over with short intervals, which I find unacceptable.

I will see if I can find the responsible podmin and direct my request at them, although this may prove to be a quixotic task if I have to trace and email every podmin whois using go-camo.

@dropwhile
Copy link
Member

dropwhile commented Feb 15, 2017

Ah. I was under the impression that you were running Go-Camo yourself.

In this case it sounds like Go-Camo is being hosted by someone else, and is simply proxying end user requests (likely browser) for content. Someone probably just linked an image you host on a forum.

From your perspective it looks odd to have them all coming from a single ip, as this is the proxy in action. However, Go-Camo should only be requesting the content once per client request (30 viewers over 43 seconds).

Go-Camo does supply a default User-Agent of "go-camo", which you could use to reject the requests. Likewise, you could block by IP, just as you would anyone else requesting content from your server more than you find desirous.

@kseistrup
Copy link
Author

I do not believe that this is caused just because someone has linked to the image on a forum. That single image has been requested 4983 times (and counting!) in the past 12 hours. I have contacted who could be (or could not, who knows) be the podmin of sechat.org and asked him to correct the issue. Otherwise I will certainly block the IP address (but that is one sinner out of many, which is why I used the term quixotic in my previous reply).

That said, I still think it would be beneficial if go-camo could implement a cache of its own. After all, go-camo (or Camo proper) is the one who first sees a given media file. I'm not talking about an elaborate caching system here, nor about caching gigabyte after gigabyte of data. But implementing, say, a ring-buffer like cache that remembers the last N media files that has been requested would do a lot to alleviate a situation like the one that I described above.

@denschub
Copy link

a ring-buffer like cache that remembers the last N media files that has been requested

Regardless of the language of implementation (may it be the original Camo or Camo-Go or something else), Camo is a image proxy. Nothing else. A proxy. In fact, caching media files hosted on other servers would be a privacy violation in many countries.

@kseistrup
Copy link
Author

Right, so I really only have very few options:

  1. Contact the podmin of each and every Diaspora* pod and ask them to set up a CDN
  2. Block abusing IP addresses as I see them coming
  3. Block all requests made by Camo or go-camo
  4. Use e.g. a Cloudflare CDN myself for making the mistake of hosting a few shitty pictures

@zauberstuhl
Copy link

It is always nice to hear that someone is using Go-Camo!

well I love it ! keep up the good work \m/

@kseistrup keep calm I am fixing it for the mentioned IP

I also don't see a reason for implementing caching functionality into go-camo since every modern web-server supports it. In my case it was a simple misconfiguration with the gocamo instance.

making the mistake of hosting a few shitty pictures

well I agree that my setup wasn't the ideal one, but hosting files generally leads to high traffic especially if you post stuff on a social network.

anyway we are way off-topic

@dropwhile
Copy link
Member

@kseistrup My personal opinion is that #1 would be a waste of your time, and likely be more hassle than it is worth. #2 and #3 are pretty standard requirements for most public facing websites, and those are likely your best options. #4 might work out well for you too. Depending on economics you may have another option as well:

  1. Move your web server to a hosting service. Most provide fairly generous bandwidth. You could do shared hosting or rent a VM (digitalocean, vultr, etc), depending on requirements. This can range from fairly cheap ($5 a month) to as expensive as you want it to be (more cpus, more ram, etc).

Closing issue, as I don't think there is much more ground to cover here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants