Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching: move bulk of Max-Age / TTL calculation part to DoC server #19

Merged
merged 3 commits into from
Jun 23, 2022

Conversation

miri64
Copy link
Collaborator

@miri64 miri64 commented Mar 7, 2022

This offers an alternative to https://github.com/anr-bmbf-pivot/draft-dns-over-coap/pull/17 in response to https://github.com/anr-bmbf-pivot/draft-dns-over-coap/pull/17#issuecomment-1059931619.

Now, instead of using the minimum TTL as Max-Age at the DoC server and then the DoC client needing to calculate the "true TTL" after caching from the Max-Age, the bulk of checking and calculating now happens at the (presumed to be more powerful) DoC server: The DoC server takes the mimimum TTL and substracts it from all TTLs in the DNS response, the DoC client then only takes the Max-Age and adds it to all TTLs again (instead of needing to search the minimum TTL first, calculating the difference and then adding it to all TTLs).

For illustration, here the example @chrysn gave me offline:

Let's assume there is an upstream DNS server, a DoC server that serves responses from that DNS server and also has its own DNS cache, and a (DoC agnostic) CoAP proxy between the DoC server and one or more DoC clients. A DoC client requests a an A record for example.org and it leads to the following situation.

  • Upstream DNS sent: example.org TTL=300 IN A 192.0.2.7
  • DNS cache at DoC server, however, had that response already for 100s and sends: example.org TTL=200 IN A 192.0.2.7
  • DoC server, as such, sends: Max-Age=200, E-Tag=h, 2.05 Content, [example.org TTL=0 IN A 192.0.2.7]
    • h is a hash over the response payload [example.org TTL=0 IN A 192.0.2.7]
  • The CoAP proxy, as such, receives that and caches for 200s in a manner where it is only deleted when a cache out is happening and otherwise just marked stale

20 minutes later, the CoAP proxy receives yet another request for an A record for example.org, the response is in the cache, but marked stale. As such, it asks the DoC server for revalidation using by sending the E-Tag h. The DoC server does not have the response in its DNS cache anymore and thus asks upstream:

  • Upstream DNS still sends: example.org TTL=300 IN A 192.0.2.7
  • DoC server receives that and prepares: Max-Age=300, E-Tag=h, 2.05 Content [example.org TTL=0 IN A 192.0.2.7]
  • Since h still is the same, it can just send: Max-Age=300, E-Tag=h, 2.03 Valid
  • The CoAP proxy receives that response and just need to un-stale the cache entry (unless it was, of course, cached out in the meantime, in which case it should re-request without E-Tag).

TODO: Adapt examples!

@miri64 miri64 requested review from chrysn and waehlisch March 7, 2022 11:13
@miri64
Copy link
Collaborator Author

miri64 commented Mar 7, 2022

I find the RECOMMENDED/MUST dualism for the DoC server part a bit clunky myself at the moment. Maybe someone else has an idea for better wording here?

@cgundogan
Copy link
Collaborator

It's a pity that Max-Age SHOULD be updated on cache hits, and not MUST. This can lead to the problem that we also saw in some NDN implementations, where responses wander from cache to cache without ever getting stale.

For our use case, this means that we might deliver already invalid DNS responses, but I think there is no way around that if we want to use the Max-Age and E-Tag functionality of CoAP (which we should). @chrysn are you aware of any proxy implementation that decided to not follow this SHOULD?

We should probably state this problem somewhere in the document, but perhaps it's not that critical to include it in this upcoming version.

@chrysn
Copy link
Member

chrysn commented Mar 7, 2022 via email

@cgundogan
Copy link
Collaborator

Section 5.7.1 is pretty explicit on the upper bound:
If a response is generated out of a cache, the generated (or implied) Max-Age Option MUST NOT extend the max-age originally set by the server, considering the time the resource representation spent in the cache.
The SHOULD in 5.10.5 is only about retransmissions, so it only adds significant time in case of retransmissions, and never in total more than the time spent waiting for it -- how could that make cache entries go around indefinitely?

The section you reference is indeed very explicit about it. This solves my concerns about the caching time.

@miri64 miri64 changed the title Caching: move Max-Age / TTL calculation part to DoC server Caching: move bulk of Max-Age / TTL calculation part to DoC server Mar 7, 2022
@miri64
Copy link
Collaborator Author

miri64 commented Mar 7, 2022

I find the RECOMMENDED/MUST dualism for the DoC server part a bit clunky myself at the moment. Maybe someone else has an idea for better wording here?

Tried myself on that in 939a7c5... not sure mmhhh maybe we should drop the RECOMMENDED part and just require it from the DoC server to do it that way?

@miri64
Copy link
Collaborator Author

miri64 commented Apr 4, 2022

I noticed something while evaluating this: Some resolvers (or at least dnspython, which I use for aiodnsprox, but I also have seen it with dnsmasq) will shuffle the answers as a default, if there are multiple, so either answer is taken at some point (e.g. if the client just uses the first address). So, if there are multiple answers, the likelihood of a content based ETag being the same is very small, even if we adapt the TTLs and (even with different content formats, unless we keep the state of responses for those somehow...).

@chrysn
Copy link
Member

chrysn commented Apr 4, 2022 via email

@miri64
Copy link
Collaborator Author

miri64 commented Apr 4, 2022

If they may shuffle, we may sort?

Let's put a pin into that at least as an option 😁

@waehlisch
Copy link
Collaborator

Some resolvers (or at least dnspython, which I use for aiodnsprox, but I also have seen it with dnsmasq) will shuffle the answers as a default,

No surprise. Kind of load balancing, often round-robin.

@chrysn
Copy link
Member

chrysn commented Apr 4, 2022 via email

@miri64
Copy link
Collaborator Author

miri64 commented Apr 4, 2022

Some resolvers (or at least dnspython, which I use for aiodnsprox, but I also have seen it with dnsmasq) will shuffle the answers as a default,

No surprise. Kind of load balancing, often round-robin.

I suspected it to be done for that reason.

@ektrah
Copy link

ektrah commented Apr 4, 2022

@chrysn Entity tags in CoAP don't really have much semantics by themselves (i.e., there is no statement such as "if the bytes making up the representation change, then the entity tag must change as well"). Their semantics comes from the way they're used: If a client has a response with an entity tag in its cache, it can validate that the response is still usable using the ETag Option. IMO it would be perfectly fine if a server responds that a response is still usable when the response is semantically still the same but the bytes have changed.

@chrysn
Copy link
Member

chrysn commented Apr 4, 2022 via email

@ektrah
Copy link

ektrah commented Apr 4, 2022

@chrysn Good point. Maybe section 5 should have been more explicit on this, since it's currently silent on the semantics of entity-tags themselves. (There is a well-hidden side note in section 10, though, that is indeed pretty explicit that "CoAP ETags are always strong ETags in the HTTP sense; CoAP does not have the equivalent of HTTP weak ETags"). Too bad. 🤷

Co-authored-by: chrysn <chrysn@fsfe.org>
@miri64
Copy link
Collaborator Author

miri64 commented Jun 23, 2022

Rebased to current master.

All relevant things on caching were discussed in 4.3.2, considerations
on proxy / DoC server behavior and late responses are general CoAP
problems.
@miri64 miri64 merged commit 64b7e20 into core-wg:main Jun 23, 2022
@miri64 miri64 deleted the proxies-and-caching2 branch June 23, 2022 11:38
@miri64 miri64 mentioned this pull request Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants