Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin/cache: key cache on Checking Disabled (CD) bit #6354

Merged
merged 3 commits into from Nov 10, 2023

Conversation

gcs278
Copy link
Contributor

@gcs278 gcs278 commented Oct 11, 2023

This update keys the cache on CD bit, which effectively separates the entries for queries with CD disabled or enabled.

1. Why is this pull request needed and what does it do?

The cache plugin currently does not have separate entries for CD disabled/enabled. When a user initially queries WITH the CD bit set, the cache stores the a potentially invalid cache entry. Then if a user queries without CD bit set, CoreDNS will return the invalid cache entry (as if the CD bit was set) which violates rfc4035#section-4.7:

Resolvers MUST NOT return RRsets from the BAD cache unless the resolver is not required to validate the signatures of the RRsets in question under the rules given in Section 4.2 of this document.

And rfc4035.html#section-3.2.2 specifically says our cache should depend on the CD Bit of the original query:

If the resolver side implements a BAD cache (see Section 4.7) and the name server side receives a query that matches an entry in the resolver side's BAD cache, the name server side's response depends on the state of the CD bit in the original query.

2. Which issues (if any) are related?

#6186

3. Which documentation changes (if any) need to be made?

N/A

4. Does this introduce a backward incompatible change or deprecation?

No

@gcs278 gcs278 force-pushed the cache-key-on-cd branch 2 times, most recently from c399895 to abd39ac Compare October 13, 2023 21:58
@codecov
Copy link

codecov bot commented Oct 13, 2023

Codecov Report

Merging #6354 (408a8a3) into master (93c57b6) will increase coverage by 2.79%.
Report is 1078 commits behind head on master.
The diff coverage is 55.64%.

@@            Coverage Diff             @@
##           master    #6354      +/-   ##
==========================================
+ Coverage   55.70%   58.49%   +2.79%     
==========================================
  Files         224      252      +28     
  Lines       10016    16544    +6528     
==========================================
+ Hits         5579     9677    +4098     
- Misses       3978     6278    +2300     
- Partials      459      589     +130     
Files Coverage Δ
core/dnsserver/address.go 100.00% <100.00%> (ø)
core/dnsserver/config.go 0.00% <ø> (ø)
plugin/acl/setup.go 94.49% <100.00%> (+1.81%) ⬆️
plugin/auto/walk.go 76.05% <100.00%> (+2.72%) ⬆️
plugin/auto/zone.go 78.94% <ø> (+0.68%) ⬆️
plugin/autopath/autopath.go 73.61% <ø> (-1.39%) ⬇️
plugin/autopath/setup.go 54.34% <100.00%> (-3.55%) ⬇️
plugin/bufsize/bufsize.go 100.00% <100.00%> (ø)
plugin/bufsize/setup.go 60.60% <100.00%> (+4.08%) ⬆️
plugin/cache/dnssec.go 85.71% <100.00%> (+25.71%) ⬆️
... and 76 more

... and 171 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@candita
Copy link

candita commented Oct 18, 2023

/assign @Miciah

@gcs278
Copy link
Contributor Author

gcs278 commented Oct 24, 2023

@chrisohaver could you put this on your queue to review when you get a chance? Thanks!

@@ -144,20 +149,21 @@ var cacheTestCases = []cacheTestCase{
test.MX("miek.nl. 3600 IN MX 10 aspmx2.googlemail.com."),
test.RRSIG("miek.nl. 1800 IN RRSIG MX 8 2 1800 20160521031301 20160421031301 12051 miek.nl. lAaEzB5teQLLKyDenatmyhca7blLRg9DoGNrhe3NReBZN5C5/pMQk8Jc u25hv2fW23/SLm5IC2zaDpp2Fzgm6Jf7e90/yLcwQPuE7JjS55WMF+HE LEh7Z6AEb+Iq4BWmNhUz6gPxD4d9eRMs7EAzk13o1NYi5/JhfL6IlaYy qkc="),
},
RecursionAvailable: true,
},
shouldCache: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is out specified for this test case with shouldCache: false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It never uses out. I'll clean it up.

Comment on lines 20 to 22
out test.Case // the expected message coming "out" of cache
in test.Case // the test message going "in" to cache
shouldCache bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, it does seem cleaner to move AuthenticatedData, RecursionAvailable, and Truncated into test.Case and to have an explicit expected out value for each test case. However, it seems that TestCacheInsertion doesn't actually check these fields in out: I can remove everything except Answer or Ns from out in each of the test cases, and tests still pass. That seems strange to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I think the right thing to do is to add the various header flag checks to the Header function:

func Header(tc Case, resp *dns.Msg) error {

However, i started down this path and saved my progress here: master...gcs278:coredns:unit-test-header-checking, but this seems like a large unit test change and a bit out of the scope of this PR.

So, for the time being, I added the individual checks in TestCacheInsertion (AuthenticatedData, RecursionAvailable, CheckingDisabled, Authoritative). Note that Truncated actually doesn't really need to be checked, because the expectation is that Truncated responses should never be cached.

I can follow up with another PR with the unit test changes. It seems like a gap in coverage that we aren't consistently checking the header flags for the plugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also realize this PR leaves things in a bit of a transitional phase in which these fields are move into test.Case, but other unit tests are still utilizing adding the header bits at the top level test case structure such as in cname_test.go:

I don't think this is too problematic to be in this transitional state, since I feel like it like the header flags should be transitioned to be specified in the test.Case, but feel free to let me know if you feel differently.

query: test.Case{
Qname: "example.org.",
Qtype: dns.TypeA,
Truncated: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense for query to be truncated? Anyway, the test still passes if I delete this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope that doesn't make sense, I think I was just iterating through the header fields, but TC bit only matters for responses. We won't cache truncated entries, and that test is covered in TestCacheInsertion, so I'll delete this.

Comment on lines 119 to 187
if !v.fetch {
t.Fatalf("After %s: want request to trigger a prefetch", v.after)
}
break
case <-time.After(time.Second):
t.Fatalf("After %s: want request to trigger a prefetch", v.after)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're in here, does want, got := rec.Rcode, dns.RcodeSuccess a few lines below have the values reversed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@chrisohaver chrisohaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

And thank you for your superb PR description!

Copy link
Contributor Author

@gcs278 gcs278 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -144,20 +149,21 @@ var cacheTestCases = []cacheTestCase{
test.MX("miek.nl. 3600 IN MX 10 aspmx2.googlemail.com."),
test.RRSIG("miek.nl. 1800 IN RRSIG MX 8 2 1800 20160521031301 20160421031301 12051 miek.nl. lAaEzB5teQLLKyDenatmyhca7blLRg9DoGNrhe3NReBZN5C5/pMQk8Jc u25hv2fW23/SLm5IC2zaDpp2Fzgm6Jf7e90/yLcwQPuE7JjS55WMF+HE LEh7Z6AEb+Iq4BWmNhUz6gPxD4d9eRMs7EAzk13o1NYi5/JhfL6IlaYy qkc="),
},
RecursionAvailable: true,
},
shouldCache: false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It never uses out. I'll clean it up.

Comment on lines 20 to 22
out test.Case // the expected message coming "out" of cache
in test.Case // the test message going "in" to cache
shouldCache bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. I think the right thing to do is to add the various header flag checks to the Header function:

func Header(tc Case, resp *dns.Msg) error {

However, i started down this path and saved my progress here: master...gcs278:coredns:unit-test-header-checking, but this seems like a large unit test change and a bit out of the scope of this PR.

So, for the time being, I added the individual checks in TestCacheInsertion (AuthenticatedData, RecursionAvailable, CheckingDisabled, Authoritative). Note that Truncated actually doesn't really need to be checked, because the expectation is that Truncated responses should never be cached.

I can follow up with another PR with the unit test changes. It seems like a gap in coverage that we aren't consistently checking the header flags for the plugins.

query: test.Case{
Qname: "example.org.",
Qtype: dns.TypeA,
Truncated: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope that doesn't make sense, I think I was just iterating through the header fields, but TC bit only matters for responses. We won't cache truncated entries, and that test is covered in TestCacheInsertion, so I'll delete this.

Comment on lines 119 to 187
if !v.fetch {
t.Fatalf("After %s: want request to trigger a prefetch", v.after)
}
break
case <-time.After(time.Second):
t.Fatalf("After %s: want request to trigger a prefetch", v.after)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments, but it's all trivial things that you can feel free to ignore or leave to follow-up clean-up PRs.
/lgtm

Comment on lines 318 to 327
t.Error("Expected Authoritative Answer bit to be true, but was false")
}
if resp.AuthenticatedData != tc.out.AuthenticatedData {
t.Errorf("Expected Authenticated Data bit to be %t, but got %t", tc.out.AuthenticatedData, resp.AuthenticatedData)
}
if resp.RecursionAvailable != tc.out.RecursionAvailable {
t.Errorf("Expected Recursion Available bit to be %t, but got %t", tc.out.RecursionAvailable, resp.RecursionAvailable)
}
if resp.CheckingDisabled != tc.out.CheckingDisabled {
t.Errorf("Expected Checking Disabled bit to be %t, but got %t", tc.out.CheckingDisabled, resp.CheckingDisabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error messages above include the index of the test case, so it would make sense for these error messages to include it as well:

Suggested change
t.Error("Expected Authoritative Answer bit to be true, but was false")
}
if resp.AuthenticatedData != tc.out.AuthenticatedData {
t.Errorf("Expected Authenticated Data bit to be %t, but got %t", tc.out.AuthenticatedData, resp.AuthenticatedData)
}
if resp.RecursionAvailable != tc.out.RecursionAvailable {
t.Errorf("Expected Recursion Available bit to be %t, but got %t", tc.out.RecursionAvailable, resp.RecursionAvailable)
}
if resp.CheckingDisabled != tc.out.CheckingDisabled {
t.Errorf("Expected Checking Disabled bit to be %t, but got %t", tc.out.CheckingDisabled, resp.CheckingDisabled)
t.Errorf("Test %d: Expected Authoritative Answer bit to be true, but was false", n)
}
if resp.AuthenticatedData != tc.out.AuthenticatedData {
t.Errorf("Test %d: Expected Authenticated Data bit to be %t, but got %t", n, tc.out.AuthenticatedData, resp.AuthenticatedData)
}
if resp.RecursionAvailable != tc.out.RecursionAvailable {
t.Errorf("Test %d: Expected Recursion Available bit to be %t, but got %t", n, tc.out.RecursionAvailable, resp.RecursionAvailable)
}
if resp.CheckingDisabled != tc.out.CheckingDisabled {
t.Errorf("Test %d: Expected Checking Disabled bit to be %t, but got %t", n, tc.out.CheckingDisabled, resp.CheckingDisabled)

On the other hand, I notice that some t.Errors below don't print the index. Rather than changing all the error messages, might be simpler to rewrite this test to use t.Run. Alternatively, if you want to keep things simple in this PR, just t.Log("Test", n) at the top of the loop.

On the gripping hand, it's just an error message in a test that shouldn't be failing anyway, so ¯\_ (ツ)_/¯.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I bit bullet and added another commit that wraps TestCacheInsertion in t.Run. I moved the test case struct into the unit test function, so you can reap the benefit of running individual test cases (at least with GoLand and I presume go test as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name has a typo: prefech_test.go should be prefetch_test.go. Potential follow-up clean-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, didn't even notice. Added as a separate commit, but let me know if it's too much for this PR.

if !v.fetch {
t.Fatalf("After %s: want request to trigger a prefetch", v.after)
}
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need break here? Wouldn't an empty case clause have the same effect? I know I reviewed this code before and didn't comment on this line then, but I can't see a reason for it now, so if there is a valid reason, it's subtle enough that it warrants a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - removed, and added a comment for readability.

},
},
{
name: "CD bit and DO Bit should be unique",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalization is inconsistent here and a few other places: "bit" versus "Bit". (I hesitate to comment on such a trivial thing...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, I don't mind fixing nits while I'm fixing other things.

Key the cache on CD bit, which effectively separates the entries for
queries with CD disabled or enabled.

Signed-off-by: Grant Spence <gspence@redhat.com>
@gcs278 gcs278 force-pushed the cache-key-on-cd branch 4 times, most recently from e071d58 to ea85386 Compare November 9, 2023 22:07
Comment on lines 302 to 305
t.Errorf("Cached message that should not have been cached: %s", state.Name())
}
if resp.RecursionAvailable != tc.out.RecursionAvailable {
t.Errorf("Expected Recursion Available bit to be %t, but got %t", tc.out.RecursionAvailable, resp.RecursionAvailable)
}
if resp.CheckingDisabled != tc.out.CheckingDisabled {
t.Errorf("Expected Checking Disabled bit to be %t, but got %t", tc.out.CheckingDisabled, resp.CheckingDisabled)
if tc.shouldCache && !found {
t.Errorf("Did not cache message that should have been cached: %s", state.Name())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be t.Fatalf now. You still want to go to the next test case (which pre-t.Run-refactoring was done by continue) rather than do all the if resp.foo != tc.out.foo { t.Errorf(...) } checks below if you didn't find a cache entry or got an unexpected cache entry. (Alternatively, you could make the if found into an else if found.)

Copy link
Contributor Author

@gcs278 gcs278 Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I debated internally about this and thought it wouldn't hurt to continue checking the improperly found cache item, but now that seems pointless since the test is already invalid.

Wrap TestCacheInsertion unit test in t.Run and provide descriptive
names for the test cases.

Signed-off-by: Grant Spence <gspence@redhat.com>
Rename prefech_test.go  to prefetch_test.go

Signed-off-by: Grant Spence <gspence@redhat.com>
@Miciah
Copy link
Contributor

Miciah commented Nov 10, 2023

Thanks!
/lgtm

@gcs278
Copy link
Contributor Author

gcs278 commented Nov 10, 2023

@chrisohaver ready for your final review

@chrisohaver chrisohaver merged commit 997c7f9 into coredns:master Nov 10, 2023
13 checks passed
gcs278 added a commit to gcs278/coredns that referenced this pull request Nov 13, 2023
plugin/cache: key cache on Checking Disabled (CD) bit (coredns#6354)

* plugin/cache: key cache on Checking Disabled (CD) bit

Key the cache on CD bit, which effectively separates the entries for
queries with CD disabled or enabled.

Signed-off-by: Grant Spence <gspence@redhat.com>
gcs278 added a commit to gcs278/coredns that referenced this pull request Nov 30, 2023
plugin/cache: key cache on Checking Disabled (CD) bit (coredns#6354)

* plugin/cache: key cache on Checking Disabled (CD) bit

Key the cache on CD bit, which effectively separates the entries for
queries with CD disabled or enabled.

Signed-off-by: Grant Spence <gspence@redhat.com>
gcs278 added a commit to gcs278/coredns that referenced this pull request Dec 4, 2023
plugin/cache: key cache on Checking Disabled (CD) bit (coredns#6354)

* plugin/cache: key cache on Checking Disabled (CD) bit

Key the cache on CD bit, which effectively separates the entries for
queries with CD disabled or enabled.

Signed-off-by: Grant Spence <gspence@redhat.com>
@SuperQ SuperQ mentioned this pull request Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants