Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot match current iteration key ETag #58

Closed
jpds opened this issue Jul 30, 2023 · 9 comments · Fixed by darkweak/souin#357
Closed

Cannot match current iteration key ETag #58

jpds opened this issue Jul 30, 2023 · 9 comments · Fixed by darkweak/souin#357
Assignees
Labels
bug Something isn't working

Comments

@jpds
Copy link

jpds commented Jul 30, 2023

I have this plugin configured as follows:

{
	on_demand_tls {
		ask http://localhost:3903/check
		interval 2m
		burst 5
	}

	order cache before rewrite

	cache {
		allowed_http_verbs GET
		default_cache_control public
		nuts {
			path /var/db/caddy/souin/
		}
		ttl 8h
	}
}

https:// {
	cache {
		timeout {
			backend 30s
		}
	}

	tls {
		on_demand
	}

	reverse_proxy localhost:3902
}

However, every time I hit one of the domains fronted by this configuration, I see this in the logs:

$ curl -v https://www.domain.com/
...
 TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 200
< accept-ranges: bytes
< alt-svc: h3=":443"; ma=2592000
< cache-control: public
< cache-status: Souin; fwd=uri-miss; stored; key=GET-https-www.domain.com-/
< content-security-policy: default-src 'self';
< content-type: text/html
< date: Sun, 30 Jul 2023 22:46:54 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
...

And the ETag never changes between curl runs, however in the logs I then see:

{"level":"info","ts":1690757113.7676728,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757115.2038972,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757116.2018719,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757117.201102,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"info","ts":1690757118.0917957,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
@darkweak
Copy link
Collaborator

Hello @jpds thank you for this feedback and the detailed issue.
AFAIK we're not handling the case the stored response contains an ETag and the client doesn't send one. We will fix that asap.

@jpds
Copy link
Author

jpds commented Aug 10, 2023

I built the linked PR with:

$ xcaddy build v2.7.3 --with github.com/darkweak/souin@7cc8ae262d11cc1788e0f2c5974af40d717452d8 --with github.com/caddyserver/cache-handle
...
go: added github.com/darkweak/souin v1.6.40-0.20230810001327-7cc8ae262d11
...
go: added github.com/caddyserver/cache-handler v0.8.0

However, the ETag still does not seem to be cached, I tripled tested this with Firefox/Chromium/curl:

{"level":"debug","ts":1691685939.2154143,"logger":"http.handlers.cache","msg":"Incomming request &{Method:GET URL:/ Proto:HTTP/3.0 ProtoMajor:3 ProtoMinor:0 Header:map[Accept:[text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8] Accept-Encoding:[gzip, deflate, br] Accept-Language:[en-US,en;q=0.5] Alt-Used:[www.domain.com] If-Modified-Since:[Thu, 02 Feb 2023 12:56:22 GMT] If-None-Match:[\"636849236767b0deb94dd80d04c5bc74\"] Priority:[u=1] Sec-Fetch-Dest:[document] Sec-Fetch-Mode:[navigate] Sec-Fetch-Site:[none] Sec-Fetch-User:[?1] Upgrade-Insecure-Requests:[1] User-Agent:[Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0]] Body:0x8656ce1e0 GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:www.domain.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:[2a0c:XXXX:XX::XXXX]:59406 RequestURI:/ TLS:0x864de7b80 Cancel:<nil> Response:<nil> ctx:0x86561b560}"}
{"level":"debug","ts":1691685939.2157462,"logger":"http.handlers.cache","msg":"Request cache-control &{MaxAge:-1 MaxStale:-1 MaxStaleSet:false MinFresh:-1 NoCache:false NoStore:false NoTransform:false OnlyIfCached:false StaleIfError:0 Extensions:[]}"}
{"level":"debug","ts":1691685939.215986,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ didn't match the current iteration key ETag GET-https-www.domain.com-/"}
{"level":"debug","ts":1691685939.2161043,"logger":"http.handlers.cache","msg":"Request the upstream server"}
{"level":"debug","ts":1691685939.2208962,"logger":"http.handlers.reverse_proxy","msg":"selected upstream","dial":"localhost:3902","total_upstreams":1}
{"level":"debug","ts":1691685939.2755642,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"localhost:3902","duration":0.054328866,"request":{"remote_ip":"2a0c:XXXX:XX::XXXX","remote_port":"59406","client_ip":"2a0c:XXXX:XX::XXXX","proto":"HTTP/3.0","method":"GET","host":"www.domain.com","uri":"/","headers":{"Date":["Thu, 10 Aug 2023 16:45:39 UTC"],"Priority":["u=1"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0"],"Sec-Fetch-Site":["none"],"X-Forwarded-Proto":["https"],"Upgrade-Insecure-Requests":["1"],"Sec-Fetch-User":["?1"],"Accept-Encoding":["gzip, deflate, br"],"Sec-Fetch-Dest":["document"],"X-Forwarded-Host":["www.domain.com"],"Sec-Fetch-Mode":["navigate"],"If-None-Match":["\"636849236767b0deb94dd80d04c5bc74\""],"If-Modified-Since":["Thu, 02 Feb 2023 12:56:22 GMT"],"Accept-Language":["en-US,en;q=0.5"],"Alt-Used":["www.domain.com"],"X-Forwarded-For":["2a0c:XXXX:XX::XXXX"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"]},"tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h3","server_name":"www.domain.com"}},"headers":{"Date":["Thu, 10 Aug 2023 16:45:39 GMT"]},"status":304}

curl always returns:

< cache-control: public, max-age=86400
< cache-status: Souin; fwd=uri-miss; stored; key=GET-https-www.domain.com-/
< date: Thu, 10 Aug 2023 16:55:39 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT

@darkweak
Copy link
Collaborator

@jpds I updated the codebase in the PR and added a unit test about this feature. 🤞

@jpds
Copy link
Author

jpds commented Aug 11, 2023

@darkweak Thank you very much for the fix. I can confirm that the logs say:

{"level":"debug","ts":1691712666.6366026,"logger":"http.handlers.cache","msg":"The key GET-https-www.domain.com-/ matched the current iteration key ETag GET-https-www.domain.com-/"}

However, every request now seems to request revalidation:

< cache-control: public, max-age=86400
< cache-status: Souin; fwd=request; fwd-status=200; key=GET-https-www.domain.com-/; detail=REQUEST-REVALIDATION
< content-type: text/html
< date: Fri, 11 Aug 2023 00:38:44 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT

Is there a configuration option I'm missing?

Edit: I've also just discovered that the backend I'm using also does caching and here's the logic they use: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/67b1457c77d930cd12f83f5cf07ec7f2376fa00c/src/api/s3/get.rs#L52-L89

@darkweak
Copy link
Collaborator

@jpds I updated the debug message to display the current validator (that contains the ETagResponse and the ETagRequest) with that you'll be able to find if there is an ETag mismatch.

@jpds
Copy link
Author

jpds commented Aug 11, 2023

None of my client requests will be sending ETags, I believe - here's the new log output:

{"level":"debug","ts":1691742614.1066208,"logger":"http.handlers.cache","msg":"Incomming request &{Method:GET URL:/ Proto:HTTP/2.0 ProtoMajor:2 ProtoMinor:0 Header:map[Accept:[*/*] User-Agent: [curl/8.0.1]] Body:0x8743a4210 GetBody:<nil> ContentLength:0 TransferEncoding:[] Close:false Host:www.domain.com Form:map[] PostForm:map[] MultipartForm:<nil> Trailer:map[] RemoteAddr:[2a0c:XXXX:XX::XXXX]:55244 RequestURI:/ TLS:0x853eea4d0 Cancel:<nil> Response:<nil> ctx:0x874332ed0}"}
{"level":"debug","ts":1691742614.106806,"logger":"http.handlers.cache","msg":"Request cache-control &{MaxAge:-1 MaxStale:-1 MaxStaleSet:false MinFresh:-1 NoCache:false NoStore:false NoTransform:false OnlyIfCached:false StaleIfError:0 Extensions:[]}"}
{"level":"debug","ts":1691742614.1069987,"logger":"http.handlers.cache","msg":"The stored key GET-https-www.domain.com-/ matched the current iteration key ETag &{Matched:true IfNoneMatchPresent:false IfMatchPresent:false IfModifiedSincePresent:false IfUnmodifiedSincePresent:false IfUnmotModifiedSincePresent:false NeedRevalidation:true IfModifiedSince:0001-01-01 00:00:00 +0000 UTC IfUnmodifiedSince:0001-01-01 00:00:00 +0000 UTC IfNoneMatch:[] IfMatch:[] RequestETag: ResponseETag:\"636849236767b0deb94dd80d04c5bc74\"}"}
{"level":"debug","ts":1691742614.1070535,"logger":"http.handlers.cache","msg":"Revalidate the request with the upstream server"}
{"level":"debug","ts":1691742614.1091118,"logger":"http.handlers.reverse_proxy","msg":"selected upstream","dial":"localhost:3902","total_upstreams":1}
{"level":"debug","ts":1691742614.1201108,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"localhost:3902","duration":0.010700285,"request":{"remote_ip":"2a0c:XXXX:XX::XXXX","remote_port":"55244","client_ip":"2a0c:XXXX:XX::XXXX","proto":"HTTP/2.0","method":"GET","host":"www.domain.com","uri":"/","headers":{"User-Agent":["curl/8.0.1"],"Accept":["*/*"],"Date":["Fri, 11 Aug 2023 08:30:14 UTC"],"Etag":["\"636849236767b0deb94dd80d04c5bc74\""],"X-Forwarded-For":["2a0c:XXXX:XX::XXXX"],"X-Forwarded-Proto":["https"],"X-Forwarded-Host":["www.domain.com"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"www.domain.com"}},"headers":{"Accept-Ranges":["bytes"],"Etag":["\"636849236767b0deb94dd80d04c5bc74\""],"Content-Length":["777"],"Date":["Fri, 11 Aug 2023 08:30:14 GMT"],"Content-Type":["text/html"],"Last-Modified":["Thu, 02 Feb 2023 12:56:22 GMT"]},"status":200}
{"level":"debug","ts":1691742614.1225867,"logger":"http.handlers.cache","msg":"Response cache-control &{MustRevalidate:false NoCache:map[] NoCachePresent:false NoStore:false NoTransform:false Public:true Private:map[] PrivatePresent:false ProxyRevalidate:false MaxAge:86400 SMaxAge:-1 Immutable:false StaleIfError:-1 StaleWhileRevalidate:-1 Extensions:[]}"}
{"level":"debug","ts":1691742614.1228776,"logger":"http.handlers.cache","msg":"Store the response {Status: StatusCode:200 Proto: ProtoMajor:0 ProtoMinor:0 Header:map[Accept-Ranges:[bytes] Alt-Svc:[h3=\":443\"; ma=2592000] Cache-Control:[public, max-age=86400] Content-Length:[777] Content-Security-Policy:[default-src 'self';] Content-Type:[text/html] Date:[Fri, 11 Aug 2023 08:30:14 GMT] Etag:[\"636849236767b0deb94dd80d04c5bc74\"] Last-Modified:[Thu, 02 Feb 2023 12:56:22 GMT] Permissions-Policy:[interest-cohort=()] Referrer-Policy:[no-referrer-when-downgrade] Server:[Caddy] Strict-Transport-Security:[max-age=31536000;] X-Content-Type-Options:[nosniff] X-Frame-Options:[DENY] X-Souin-Stored-Ttl:[24h0m0s]] Body:{Reader:...

All the ETags involved are coming from the backend server - hence I was wondering if there was some mismatched expectations in what the backend is proposing to Souin.

Edit: and the backend in question is a static S3 bucket with simple web content: https://garagehq.deuxfleurs.fr/documentation/cookbook/exposing-websites/ - hence I was hoping that Caddy could cache the latest copy of this content locally without having to ask the backend storage servers what was available.

@jpds
Copy link
Author

jpds commented Aug 11, 2023

I've been playing with curl ( https://daniel.haxx.se/blog/2019/12/06/curl-speaks-etag/ ) to try and better understand this and I've found:

With an ETag, curl bounces between REQUEST-REVALIDATION and UNCACHEABLE-STATUS-CODE:

# First attempt
$ curl -v --etag-compare etag.txt --etag-save etag.txt https://www.domain.com -o saved-file
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-none-match: ""]
* Using Stream ID: 1 (easy handle 0x55bb424d1350)
} [5 bytes data]
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-none-match: ""
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [130 bytes data]
< HTTP/2 200
< accept-ranges: bytes
< alt-svc: h3=":443"; ma=2592000
< cache-control: public, max-age=86400
< cache-status: Souin; fwd=request; fwd-status=200; key=GET-https-www.domain.com-/; detail=REQUEST-REVALIDATION
< content-type: text/html
< date: Fri, 11 Aug 2023 10:00:35 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 777
<
{ [5 bytes data]
100   777  100   777    0     0   3870      0 --:--:-- --:--:-- --:--:--  3885

# Second attempt
$ curl -v --etag-compare etag.txt --etag-save etag.txt https://www.domain.com -o saved-file
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-none-match: "636849236767b0deb94dd80d04c5bc74"]
* Using Stream ID: 1 (easy handle 0x55bdb2de0350)
} [5 bytes data]
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-none-match: "636849236767b0deb94dd80d04c5bc74"
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [130 bytes data]
< HTTP/2 304
< alt-svc: h3=":443"; ma=2592000
< cache-status: Souin; fwd=uri-miss; key=GET-https-www.domain.com-/; detail=UNCACHEABLE-STATUS-CODE
< date: Fri, 11 Aug 2023 10:07:37 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 0
<
{ [0 bytes data]
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

If I use the older if-modified-since method, I then see Souin hits:

$ curl -v -z "11 Aug 2023" https://www.domain.com
* using HTTP/2
* h2h3 [:method: GET]
* h2h3 [:path: /]
* h2h3 [:scheme: https]
* h2h3 [:authority: www.domain.com]
* h2h3 [user-agent: curl/8.0.1]
* h2h3 [accept: */*]
* h2h3 [if-modified-since: Fri, 11 Aug 2023 00:00:00 GMT]
* Using Stream ID: 1 (easy handle 0x563f2a52c350)
> GET / HTTP/2
> Host: www.domain.com
> user-agent: curl/8.0.1
> accept: */*
> if-modified-since: Fri, 11 Aug 2023 00:00:00 GMT
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/2 200
< accept-ranges: bytes
< age: 126
< alt-svc: h3=":443"; ma=2592000
< cache-control: public, max-age=86400
< cache-status: Souin; hit; ttl=86274; key=GET-https-www.domain.com-/
< content-type: text/html
< date: Fri, 11 Aug 2023 10:07:35 GMT
< etag: "636849236767b0deb94dd80d04c5bc74"
< last-modified: Thu, 02 Feb 2023 12:56:22 GMT
< server: Caddy
< strict-transport-security: max-age=31536000;
< content-length: 777
<
* The requested document is not new enough
* Simulate an HTTP 304 response

@darkweak
Copy link
Collaborator

Hey @jpds sorry for the delay.
I just updated the PR, let me know if there are other weird behaviours.

@jpds
Copy link
Author

jpds commented Aug 16, 2023

@darkweak Now it is all working - perfectly 🎉 . Seeing hits everywhere.

@darkweak darkweak self-assigned this Aug 17, 2023
@darkweak darkweak added the bug Something isn't working label Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants