Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin/forward: Continue waiting after receiving malformed responses #6014

Merged
merged 17 commits into from
Apr 29, 2023

Conversation

chrisohaver
Copy link
Member

1. Why is this pull request needed and what does it do?

When waiting for a response from upstream, continue waiting when receiving malformed responses, until timeout or a valid response is received. This aims to prevent spoofed malformed responses from blocking the real answer from upstream before it arrives.

2. Which issues (if any) are related?

none

3. Which documentation changes (if any) need to be made?

none

4. Does this introduce a backward incompatible change or deprecation?

no

@chrisohaver
Copy link
Member Author

OK - I've written an integration test for this ... this is my first experience with the gopacket package. The result isn't pretty but it does demonstrate the vulnerability if the fix is removed, and of course shows that with the fix in place, the vulnerability is absent.

It sniffs and injects traffic on the loopback interface, so understandable if we may not want to add it to tests.
And it requires libpcap to be installed... which apparently isn't present in our action runners. So if we do include this, we will want to add build tags to split it out from the other tests.

There are kludgey mac/bsd vs linux checks due to loopback encapsulation differences.

@codecov-commenter
Copy link

codecov-commenter commented Apr 19, 2023

Codecov Report

Merging #6014 (071b059) into master (93c57b6) will increase coverage by 1.80%.
The diff coverage is 55.18%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #6014      +/-   ##
==========================================
+ Coverage   55.70%   57.50%   +1.80%     
==========================================
  Files         224      247      +23     
  Lines       10016    16204    +6188     
==========================================
+ Hits         5579     9318    +3739     
- Misses       3978     6322    +2344     
- Partials      459      564     +105     
Impacted Files Coverage Δ
core/dnsserver/config.go 0.00% <ø> (ø)
core/dnsserver/register.go 13.79% <0.00%> (-4.19%) ⬇️
plugin/auto/auto.go 0.00% <0.00%> (ø)
plugin/auto/xfr.go 0.00% <0.00%> (ø)
plugin/auto/zone.go 78.94% <ø> (+0.68%) ⬆️
plugin/autopath/autopath.go 73.61% <ø> (-1.39%) ⬇️
plugin/azure/azure.go 10.61% <0.00%> (-0.57%) ⬇️
plugin/azure/setup.go 56.36% <0.00%> (-4.14%) ⬇️
plugin/backend_lookup.go 0.00% <0.00%> (ø)
plugin/bind/bind.go 50.00% <0.00%> (-50.00%) ⬇️
... and 78 more

... and 164 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
@rdrozhdzh
Copy link
Contributor

does this fix make sense for TCP/TLS? It's hardly that you can catch correct response after parsing some garbage from TCP stream.

Moreover, this new behavior makes the code vulnerable to DoS attack, in case if malware server returns a long stream of garbage via TCP - coredns will have to spend CPU on parsing all this garbage.

Does it make sense to enable this new logic only for UDP?

@chrisohaver
Copy link
Member Author

Does it make sense to enable this new logic only for UDP?

Yes, that was my intent, but forgot to add that check.

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
@chrisohaver
Copy link
Member Author

chrisohaver commented Apr 21, 2023

Moreover, this new behavior makes the code vulnerable to DoS attack, in case if malware server returns a long stream of garbage via TCP - coredns will have to spend CPU on parsing all this garbage.

@rdrozhdzh, You mention TCP here, but IIUC your concern would also apply to UDP connections as well?

I agree that since TCP is not vulnerable to the "spoofed malformed response" this PR intends to close, we should only apply this behavior to UDP.

However for UDP I guess we have to decide which vulnerability better to live with.

edit: IOW, are you concerned that the vulnerability this introduces is worse than the one is closes?

@rdrozhdzh
Copy link
Contributor

My previous comment was rather about TCP, since there were no chances to get valid response after getting malformed message.

As for UDP, it may make sense. If attacker bombards UDP port(s) with malformed messages (garbage), without this fix coredns will immediately return an error to client, which is also Deny-of-Service. With this fix, coredns has chance to return valid response to client, i.e. it can resist this attack (by the cost of extra CPU usage).

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
@chrisohaver
Copy link
Member Author

chrisohaver commented Apr 25, 2023

I propose we merge as is, with the integration test disabled in CI, since I cannot figure out how to get the test to work in the CI environment.

@Tantalor93
Copy link
Collaborator

Tantalor93 commented Apr 26, 2023

@chrisohaver couldnt we simplify the integration test in a way that it could be run in CI? Test not running in CI is quite useless test.

I propose we focus on the fact that malformed upstream response is skipped. The test with that sniffing etc. is quite heavyweight imho and if it is not runnable in our CI not usable.

I propose just write a test without some kind of spoofing:

  • setup upstream DNS server which on first request returns some invalid dns response and on second request valid response
  • run forward plugin and pass it valid dns request and expect valid response

@chrisohaver
Copy link
Member Author

  • setup upstream DNS server which on first request returns some invalid dns response and on second request valid response
  • run forward plugin and pass it valid dns request and expect valid response

I started with this simpler approach first, but ran into complications because as far as I could tell, the miekg/dns does not support sending malformed dns responses, and doesn't expose enough info about connections to enable inserting hand made malformed packets. Working around that started becoming complicated.

I also experimented with a "malicious gateway" style approach, where a custom gateway listener sits between the forward plugin and upstream dns, and can drop/insert malformed responses at will. I think this should be feasible, but I wasn't able to get it to work, so I moved on to a more literal sniffer approach.

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
Copy link
Collaborator

@Tantalor93 Tantalor93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@Tantalor93 Tantalor93 merged commit 604a902 into coredns:master Apr 29, 2023
14 checks passed
chrisohaver added a commit that referenced this pull request Aug 14, 2023
chrisohaver added a commit that referenced this pull request Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants