Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry logic to DNS that attempts to retry against a public resolver. Ensure that report indicates level of confidence in the correctness of the result. #41

Closed
schrolla opened this issue Dec 22, 2022 · 8 comments · Fixed by #177
Assignees
Labels
bug This issue or pull request addresses broken functionality
Milestone

Comments

@schrolla
Copy link
Collaborator

Agency 2 assessment showed a "FAIL" but when looking into the domains flagged by the assessment script, many of which were sub-domains to ones configured in the DNS records hosted by the agency's domain. Agency 2 also mentioned that they noticed two domains that should be approved.

Ethan did check some domains, one did provide a SFT and one provided a “null” output. Ethan also investigated and it could be some weird DNS lookup failures on our end. Maybe we just try to do more testing against tenants with many (think Agency 2 has 60) domains.

@nanda-katikaneni
Copy link
Collaborator

nanda-katikaneni commented Jan 12, 2023

Did EXO assessment on a E3 tenant (Test Tenant#5)- most policies worked fine but had an intermittent issue with Policy 2.2. In some of the tests, the policy errored with GetSpfRecord command failure. This maybe DNS lookup failures on the test environment - but I could not consistently reproduce the issue. The tenant had 12 domains - may need some additional testing and improved error handling to identify transient issues.

@adhilto
Copy link
Collaborator

adhilto commented Jan 12, 2023

I recommend we create an issue to enhance the DNS request code for SPF, DKIM, and DMARC. Transient issues are to be expected with DNS and our code currently isn't built to handle that appropriately. We should add a re-try mechanism so that for any failed query (other than for an NXDOMAIN) the query gets re-tried.

Also in favor of additional testing.

@schrolla
Copy link
Collaborator Author

Recommendation is to add an issue to create or re-create issue to look at adding a DNS retry to the current automation to account for potential intermittent DNS failures.

@nanda-katikaneni nanda-katikaneni self-assigned this Jan 17, 2023
@ethanb-cisa
Copy link
Contributor

ethanb-cisa commented Jan 31, 2023

I believe the issue is related to where the agency runs the script. Querying a domain from internal may hit an internal DNS server that returns a different result than what is served to the public for that domain. This is the case inside CISA. Resolve-DnsName -Type TXT -Name cisa.dhs.gov | fl returns no SPF but running from an Azure VM does.

Some orgs will block DNS requests to non-agency DNS servers, so it's not like we could set a public DNS server as the recursive resolver (1.1.1.1, etc). I believe we even require them to prevent this to prevent DNS filtering bypass.

This likely also impacts DKIM/DMARC.

So figuring that out will be...interesting.

@adhilto
Copy link
Collaborator

adhilto commented Jan 31, 2023

You're probably right. I tested the Get-ScubaSpfRecords against one of the domains that was failing in yesterday's review, it worked just fine. That said, when we overhauled EXO's error handling, we did improve the DNS code. Now if the queries fail due to a network issue, the report will display the error rather than just saying there were no SPF/DKIM/DMARC records. I'd be curious to see what the output from these agencies looks like on the latest code.

@gdasher
Copy link
Collaborator

gdasher commented Feb 2, 2023

I rescoped #38 to be about the retries. Its possible those issues are also split horizon issues, though it surprises me a bit. But it is possible they fully claim their own zones and don't think about things like txt records that only matter for inbound traffic. We should discuss in team meeting how we want to rescope these issues and move forward.

@nanda-katikaneni
Copy link
Collaborator

Now if the queries fail due to a network issue, the report will display the error rather than just saying there were no SPF/DKIM/DMARC records. I'd be curious to see what the output from these agencies looks like on the latest code.

Yes, this is the behavior I saw in my tests on E3 tenant. The new error handling is correctly flagging when spf records are missing; when the dns queries are failed due to network issues, it raised the error on the terminal output with a corresponding note in the report as below.
Screen Shot 2023-02-02 at 1 17 54 PM

@gdasher gdasher changed the title Agency 2 Pilot: Exchange Online Policy 2.2; An SPF policy(s) that designates only these addresses as approved senders SHALL be published bug Add retry logic to DNS that attempts to retry against a public resolver. Ensure that report indicates level of confidence in the correctness of the result. Feb 2, 2023
@schrolla
Copy link
Collaborator Author

Details of potential fix from duplicate issue.

💡 Summary

Enhance DNS-related code by adding a retry over DoH option.

Motivation and context

We've seen some inconsistent results for tests that explicitly make DNS requests (SPF, DKIM, and DMARC checks), possibly due to networking issues. Adding DoH as a fallback should the traditional DNS request fail improves the odds of successfully making a request.

Implementation notes

Example of making a DoH call with Powershell:

$(Invoke-WebRequest -H @{"accept"="application/dns-json"} -Uri "https://1.1.1.1/dns-query?name=dhs.cisa.gov&type=txt").RawContent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue or pull request addresses broken functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants