Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Validating DNSSEC for Signed Domains #1708

Closed
kevincox opened this issue May 13, 2022 · 14 comments · Fixed by #2084
Closed

Support Validating DNSSEC for Signed Domains #1708

kevincox opened this issue May 13, 2022 · 14 comments · Fixed by #2084

Comments

@kevincox
Copy link

kevincox commented May 13, 2022

Right now it appears that trust-dns has two options surrounding DNSSEC.

  1. Always validate, fail if the record is not signed.
  2. Never validate.

What I would like to propose is:

  1. Validate if the zone is signed. If we can get proof that a zone is unsigned allow unvalidated queries. If we don't get a valid signature and can't prove that the zone is unsigned fail the request.

This provides security for domains that support DNSSEC while not breaking the web for the domains that don't.

@bluejekyll
Copy link
Member

I think this is accurate and represents a bit of a deficiency in the current implementation.

I think to properly encode this, we will need NSEC validation that the DS record does not exist for the zones we're attempting to verify. I haven't implemented this, but I've thought that in in the past we should really be "marking" the verified records in some way as being valid DNSSEC signed records vs. invalidly signed records vs. unsigned records. This could be done with an Enum in the Message and wrapping each Record. Though it would require some deviation between request messages and response messages to achieve that.

@wez
Copy link

wez commented Aug 31, 2023

I'm looking at supporting DANE in an MTA and this feels like it would be key to implementing it securely.

How much effort would it be to implement the changes proposed in this issue? I'm open to sitting down to do this work, with the big caveat that I am not a DNSSEC guy and don't know this codebase. Is this "just" a matter of refactoring to expose the right bits in the right places?

I'm the point where I need to consider whether I should find another resolver implementation; I'm not enthusiastic about doing that!

Are there alternative ways I could work around this?

@djc
Copy link
Collaborator

djc commented Sep 1, 2023

From reading @bluejekyll's suggestions I would guess that it is indeed "just" a matter of propagating the validation context up through the call stack. The deviation between request and response messages sounds potentially hairy, so I'd maybe start there?

@bluejekyll
Copy link
Member

bluejekyll commented Sep 3, 2023

Ok, here's the thing I'm wondering. We have this type: DnsResponse which stores a Message and the buffer of the raw response today. I'm considering us adding something to that that would be some map of response records (not sure of storage yet) but something along the lines of HashMap<(Name, RecordType), DnssecChain>. DnssecChain is where we could accumulate the chain of DNSKEYs that validate the record. This type could be use to store the knowledge of the response records. Then the Lookup type would need to be expanded as well to capture the chain from the response. Then we could have a separate method from the current fn iter that could be used for dnssec operations and return records with the associated knowledge from the chain.

I don't know if this is a good idea or not. Obvious downsides would be that we would consume more memory for each Record, including cache. Maybe we could instead of storing the entire chain we could just store the proof result, i.e. DnssecSecure, DnsseFailed, and Unsecured. But here are the things I would consider changing:

  • Create DnnsecChain type.
  • Add Option<DnsecChain> to DnsResponse (we only really need one per record, but there could be many, do we want to capture them all?)
  • change dnssec_dns_handle::verify_* to take a &mut DnssecChain or return one as part of it's result.
  • change the dnssec_dns_handle::verify_* methods to recognize unsigned zones properly (essentially, the parent zone does not have an associated DS record, but is itself signed and verified, chain would be Root -> ParentZone -> Unsigned)
  • change the Lookup type to carry a parallel array or HashMap that has the associated DnssecChain for each Record.

This is a lot of work, and it would be nice to try and limit the breaking changes, so try to keep the changes to interior types and/or add new methods where necessary. I'm not sure if this the "correct" thing, and there are definitely downsides on memory consumption. Thoughts?

@wez
Copy link

wez commented Sep 3, 2023

From the consuming application perspective, the goal is to be able to reason about the signature status.
I don't know what a truly great API for this looks like, but looking at the libunbound API, it exposes this with 2 flags; the associated explanations are lifted from the docs in the header file:

  • secure: True, if the result is validated securely. False, if validation failed or domain queried has no security info. It is possible to get a result with no data (havedata is false), and secure is true. This means that the non-existence of the data was cryptographically proven (with signatures).
  • bogus: If the result was not secure (secure==0), and this result is due to a security failure, bogus is true. This means the data has been actively tampered with, signatures failed, expected signatures were not present, timestamps on signatures were out of date and so on. If !secure and !bogus, this can happen if the data is not secure because security is disabled for that domain name. This means the data is from a domain where data is not signed.

It also provides a why_bogus field that has a human readable explanation for the bogosity.

I don't know if it is necessary to retain the full chain of information internally to be able to report the equivalent information in the trust resolver results. If a component of the chain is bogus, I think it is fine to raise an error at the point of discovery.

In #1650 there is discussion around propagating the AD bit through to the result; to my I-am-not-a-DNS-expert eyes, that sounds like the secure bit, and from a plumbing perspective, it feels like that could be propagated through chains by logical operations on the bit eg: a result is secure if the whole chain is secure, and, assuming that bogus raises errors at the point of discovery, no additional state is strictly necessary to keep track of that, which should keep memory usage more reasonable. That seems conceptually simple, modulo internal API changes needed to support propagating that info.

From the public API, I think what I'm looking for is:

  • It is OK for a lookup to return an Err for the equivalent of the unbound bogus case. That indicates that something is definitely bad about the result and that it should not be used.
  • For the success case, the Lookup result container could keep track of the secure bit and expose its value.

@bluejekyll
Copy link
Member

One of the things I've been struggling with around DNSSEC is that there isn't a perfect guide to what should happen. In so many cases bogus is actually misconfigured DNS as opposed to malicious attack. That's partly why I feel like providing the proof separately is important, rather than failing the request. I think there are 4 states, 3 when the secure bit is set.

As to the ad bit, we could add a flag to the DnsRequestOptions. Right now the ad bit is always enabled in the dnssec_dns_handle, so we probably want to change the flow around how that logic is called to make it conditional.

First to your points, I agree that for your use case (and probably in general) capturing the key chains isn't necessary. But I do think we need to track each verified record in the response properly. The four states we have (which to me implies an enum as part of the return, maybe DnssecStatus?):

  • Unsigned - ad requested, but the zone is unsigned, and it is correctly unsigned as the tld like .com. has an associated NSEC (or NSEC3) record covering the DS record type.
  • Signed - ad requested, the zone is properly signed all the way to the root, ..
  • Bogus - (I'm ok with this name, or others) ad requested, but zone fails validation at any point up the chain (maybe we just track the point at which the validation failed as associated data to this variant?)
  • NonAuthentic - ad not requested, we probably don't really need to track this as it's effectively a non-dnssec request.

In terms of failure with an Err I think given the Unsigned and Signed variants, it's not clear to me that we want to encode this as just Ok vs. Err. I suppose Bogus could always be an error, but given how often DNSSEC fails for random config reasons, I think it might be work saying Ok in all cases and reserve Err for protocol failures. And then give people an interface into Lookup that could be Ok or Err at that point? maybe with different methods that state what is desired.

What do you think?

@wez
Copy link

wez commented Sep 5, 2023

The more I think about DNS results, the more I think that there is a lot more nuance than Ok and Err results, and that trying to categorize things that way is a bit lossy in that it loses some nuance in the results.

I've spent some time wrapping up unbound to play around with this stuff recently; at first I was like "ugh, look at all these discrete bits of information!", but after sitting with it for a bit, I do think that having those available makes sense, and I might even go a little further with the trust result: having NoRecordsFound be an err variant was a little surprising to me; in the unbound API they have a result type that has the list of records (if any), and the rcode holds the status information. That also allows for the no-records-but-not-nxdomain case to be reported as signed. Is an nxdomain rcode something that could also be signed as well?

So in short, yes, I would be totally happy for bogosity to not be automatically treated as an error. I think having an enum with 4 values to reflect the security status makes sense.

I would lean towards Unknown instead of NonAuthentic because NonAuthentic suggests that the result might be a forgery but in that context it seems like we just cannot know if that is true, so a less assertive name feels more appropriate. When it comes to naming things that are closely coupled with security, I think it is important not to use names that could be misinterpreted by folks that are too busy to read docs in detail! Alternatively NotRequested?

Bogus works for me, but if you're shopping for other names, something like FailedValidation seems descriptive of the actual situation, and yeah, having that variant capture context on the first part of the chain that failed validation would be helpful for logging purposes.

@djc
Copy link
Collaborator

djc commented Sep 5, 2023

I like Unknown and FailedValidation better than NonAuthentic and Bogus, FWIW.

@bluejekyll
Copy link
Member

bluejekyll commented Sep 6, 2023

having NoRecordsFound be an err variant was a little surprising to me

Perhaps not the best logic, but I ended up doing that as I needed a way to fail the request to make the async tasks failover cleanly between each other. I think it's debatable if that's an error or not. IMO, we failed to retrieve any records for the request, so it can be interpreted as an error.

Is an nxdomain rcode something that could also be signed as well?

Yes, though it's not actually an NXDOMAIN at that point. It's an NSEC record that shows that there are no records at that name. In DNSSEC I don't believe NXDOMAIN is ever a valid response, in that you should always get some proof that the name does not exist for any record type. (Well, actually NXDOMAIN might be an ok response for Unsigned records, as the zone authority wouldn't have any DNSSEC records, this is currently a gap in the proof chain implemented in dnssec_dns_handle).

I like the name suggestions. So I think the new type we're discussing is this, I put in the record idea for the FailedValidation, this would be useful for debugging a zone config without capturing the entire chain:

pub enum DnssecStatus {
    /// Domain is signed  and passed validation
    Signed,
    /// Domain is known to be unsigned and parent zone proves this
    Unsigned,
    /// Domain should be signed and either DNSKEYs were not available or verification failed
    FailedValidation{record_failed_validation: Option<(Name, RecordType)>},
    /// DNSSEC was not requested, therefor is unknown
    Unknown,
}

I'm still not sure how we should track this in the DnsResponse or Lookup yet. I'm not sure if it's a good or bad idea to have that as a parallel list in a separate field, or wrap the Records in something like DnssecRecord(Record, DnssecStatus).

@wez
Copy link

wez commented Sep 6, 2023

I'm still not sure how we should track this in the DnsResponse or Lookup yet. I'm not sure if it's a good or bad idea to have that as a parallel list in a separate field, or wrap the Records in something like DnssecRecord(Record, DnssecStatus).

I would lean away from a parallel list, as that pattern can end up feeling awkward and more complex in the long run, even if perhaps it might be simpler to integrate now.

@bluejekyll
Copy link
Member

Sounds good. We'll see how messy it gets :)

Are you comfortable exploring this change? Are there any other points you want to discuss? I think this is going to end up changing Message unless you can come up with a different pattern there. Perhaps making Message generic with a default parameter to Record, like struct Message<R = Record> where R: Into<Record>?

@djc
Copy link
Collaborator

djc commented Sep 8, 2023

BTW, I found https://datatracker.ietf.org/doc/html/rfc4035#section-4.3 to say:

A security-aware resolver MUST be able to determine whether it should expect a particular RRset to be signed. More precisely, a security-aware resolver must be able to distinguish between four cases:

  • Secure: An RRset for which the resolver is able to build a chain of signed DNSKEY and DS RRs from a trusted security anchor to the RRset. In this case, the RRset should be signed and is subject to signature validation, as described above.
  • Insecure: An RRset for which the resolver knows that it has no chain of signed DNSKEY and DS RRs from any trusted starting point to the RRset. This can occur when the target RRset lies in an unsigned zone or in a descendent of an unsigned zone. In this case, the RRset may or may not be signed, but the resolver will not be able to verify the signature.
  • Bogus: An RRset for which the resolver believes that it ought to be able to establish a chain of trust but for which it is unable to do so, either due to signatures that for some reason fail to validate or due to missing data that the relevant DNSSEC RRs indicate should be present. This case may indicate an attack but may also indicate a configuration error or some form of data corruption.
  • Indeterminate: An RRset for which the resolver is not able to determine whether the RRset should be signed, as the resolver is not able to obtain the necessary DNSSEC RRs. This can occur when the security-aware resolver is not able to contact security-aware name servers for the relevant zones.

So maybe we could just reuse these names.

@bluejekyll
Copy link
Member

Thanks for digging that up. We could even just cut and paste those descriptions into the rust-doc comments for the types. Thanks, @djc for digging that up. Also, note, we currently don't properly detect Insecure. But I am glad that my understanding of the space matches the descriptions there, for the most part.

Indeterminate is an interesting case, basically it could be either, the resolver wasn't asked for DNSSEC records, or the upstream DNS server isn't able to send them back due to being old or misconfigured.

@bluejekyll
Copy link
Member

btw, I'm going to take a stab at implementing this as I think it will unblock some other work and it might touch enough interfaces that it's worth having a discussion now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants