Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer dnspython to py3dns #25

Merged
merged 16 commits into from
Nov 22, 2017
Merged

Prefer dnspython to py3dns #25

merged 16 commits into from
Nov 22, 2017

Conversation

jsf9k
Copy link
Member

@jsf9k jsf9k commented Nov 16, 2017

During testing we found that trustymail was not always idempotent. For example, two consecutive runs of trustymail --debug --json --spf ed.gov were observed to give different results. It appears that the py3dns library does not always correctly handle the case where a DNS response is too big to fit in a UDP packet. Further testing showed that the dnspython library was more reliable.

As a result, I changed trustymail to use dnspython instead of py3dns. I also modified requirements.txt and setup.py to add the new requirement.

This should resolve #24.

For example, two consecutives of trustymail --debug --json --spf
ed.gov were observed to give different results.  It appears that the
py3dns library does not always correctly handle the case where a DNS
response is too big to fit in a UDP packet.  Further testing showed
that the dnspython library was more reliable.

As a result, I changed trustymail to use dnspython instead of py3dns.
I also modified requirements.txt and setup.py to add the new
requirement.
@jsf9k jsf9k requested a review from h-m-f-t November 16, 2017 19:07
@jsf9k
Copy link
Member Author

jsf9k commented Nov 16, 2017

Using dnspython also allows us to set the DNS servers to use on the trustymail command line. I'll add that feature as well.

Copy link
Collaborator

@IanLee1521 IanLee1521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few fixes you may want to make, but nothing earth shattering. 👍

@@ -58,6 +58,7 @@
'requests',
'docopt',
'publicsuffix',
'dnspython',
'py3dns',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to remove py3dns from the requirements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try removing py3dns, but then I got errors regarding a missing import that spf uses. So I put it back in.


if record_text.startswith("\""):
record_text = record[1:-1]
record_text = record_text[1:-1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this (record -> record_text) change the logical behavior of this function? or just something that is different about the two DNS libraries?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder if rather than the conditional, it would be better to simplify to:

record_text = record_text.lstrip('"')

Copy link
Member Author

@jsf9k jsf9k Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First point: The new library returns an object. If it's an MX record then you have to access the exchange and preference member variables. If you treat the object as a string you get something like 10 mail.blah.com.

Second point: I think you're right, and I'll make the change.

handle_error("[DMARC]", domain, error)


def find_host_from_ip(ip_addr):
return DNS.revlookup(ip_addr)
return str(resolver.query(reversename.from_address(ip_addr), "PTR")[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm a fan of splitting major indices out, to clarify what they are. Maybe something like:

hostname, _ =  resolver.query(reversename.from_address(ip_addr), "PTR")
return str(hostname)

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does reversename.from_address(ip_addr) not provide a hostname itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First point: I agree and will make the change.

Second point: No, it does not:

Python 3.6.3 (default, Oct 24 2017, 14:48:20) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dns.resolver
>>> import dns.reversename
>>> record = dns.resolver.query("www.npr.org", "A")
>>> print(record.response)
id 39913
opcode QUERY
rcode NOERROR
flags QR RD RA
;QUESTION
www.npr.org. IN A
;ANSWER
www.npr.org. 172 IN CNAME wildcard.npr.org.edgekey.net.
wildcard.npr.org.edgekey.net. 20988 IN CNAME e4539.g.akamaiedge.net.
e4539.g.akamaiedge.net. 19 IN A 23.46.53.110
;AUTHORITY
;ADDITIONAL
>>> n = dns.reversename.from_address("23.46.53.110")
>>> print(n)
110.53.46.23.in-addr.arpa.
>>> 

  user to specify the DNS hosts against which to query
* Added another exception in the catch blocks that follow calls to
  dns.resolver.Resolver.query.  In my testing I saw that this
  exception was occassionally thrown but not being caught.
…k of a line there was causing the command line options to be parsed incorrectly.
Copy link
Member Author

@jsf9k jsf9k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions about domain.errors and domain.syntax_errors.

@@ -221,20 +215,16 @@ def spf_scan(domain):
logging.debug("\tResult Differs: Expected [{0}] - Actual [{1}]".format(result, response[0]))
domain.errors.append("Result Differs: Expected [{0}] - Actual [{1}]".format(result, response[0]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should this be domain.syntax_errors?
  2. The field domain.errors doesn't even appear in the output of trustymail. Should it? Note that the method handle_errors below also writes to domain.errors.

Copy link
Member Author

@jsf9k jsf9k Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a moot point now, since I added domain.errors to the output of trustymail.

@jsf9k jsf9k self-assigned this Nov 17, 2017
@jsf9k
Copy link
Member Author

jsf9k commented Nov 18, 2017

Please don't merge this pull request just yet. It occurred to me that maybe the "flakiness" we saw was due to the inherent unreliability of UDP. I'd like to test using TCP instead to see if I still see the odd behavior.

I did some testing with the dnspython code using TCP and it's definitely more reliable. I want to stick with the dnspython code and not revert to the py3dns code we were using, since dnspython allows me to set a flag that retries queries if we receive a SERVFAIL response. That response can be sent if there is a temporary network glitch. I don't see a similar flag available for py3dns.

TLDR; Go ahead and merge this pull request if you want. 😃

field is quite helpful when debugging why trustymail gives teh result
it does fro a domain.
  reliability.  We only care about the content and correctness of the
  DNS records, not whether they fit inside a single UDP packet.
* Retry DNS queries that return SERVFAIL, since this may only indicate
  a temporary network issue.
* Removed some hard-wired timeout code that was mistakenly committed.
We were using 127.0.0.1.  When the domain being scanned uses the ptr
mechanism, a reverse DNS lookup is done against the IP of the mail
sender.  This fails for 127.0.0.1.  The solution is to use an IP that
1. Has a valid DNS PTR record
2. Is not used as a mail server by anyone

I tried using c1b1.ncats.cyber.dhs.gov, since that has a valid DNS PTR
record in our AWS Route 53 setup and it is not used as a mail server.
I found, though, that Google DNS does not return the PTR record for
that hostname.  Until that is resolved I'm using an IP that is used by
www.google.com that does have a valid PTR record and should not used
as a mail server by anyone we scan.
@jsf9k jsf9k force-pushed the bugfix/not_always_idempotent branch from a5c6cbd to 8e1fc18 Compare November 21, 2017 22:08
current-federal.csv.  This was for two reasons:
* PublicSuffixList returns a lower case version of the base domain.
  The case was not taken into account when comparing the base domain
  returned by PublicSuffixList with the domain passed into the Domain
  constructor.
* When the domain is not equal to the base domain returned by
  PublicSuffixList, the Domain constructor does a DMARC scan to
  populate some data.  That scan was being called in a way that was
  inconsistent with the current trustymail code.

I fixed these issues by:
* Converting domain names to lowercase in the Domain constructor and
  only using the lowercase version.
* Changing the DMARC scan function call to be consistent with the
  current trustymail code.

A further advantage of the first point is that many needless DMARC
scans are no longer taking place.
/etc/resolv.conf ONLY if no DNS hostnames are specified.
we were using appears to be also used for mail.
Copy link
Member

@h-m-f-t h-m-f-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the excellent additions and fixes, @jsf9k. I've documented a few minor concerns, but given our timetable I'm going to commit these, filing issues where needed.

@@ -142,7 +143,8 @@ def generate_results(self):
"DMARC Results on Base Domain": self.parent_dmarc_results(),
"DMARC Policy": self.get_dmarc_policy(),

"Syntax Errors": self.format_list(self.syntax_errors)
"Syntax Errors": self.format_list(self.syntax_errors),
"Errors": self.format_list(self.errors)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we embed scan results in our reports, and the attachments do get (occasionally 😅) looked at, the more clarity we can provide up front to enable self-servicing the better, lest we get sent a ton of inquiries. I agree with #27 that this should probably be renamed, and clarified. For example, the README indicates some of this ambiguity. A snippit reads:

  • Syntax Errors - A list of syntax errors that were detected when scanning DMARC or SPF records, or checking for STARTTLS support.
  • Errors - A list of any other errors encountered, such as DNS failures.

If Syntax Errors are just for errors of syntax, network error information about STARTTLS should probably not go there.

Perhaps Debug is a better field name than Errors?

Also, the information that lands in Errors can be ambiguous as to its provenance. For instance, in a scan of one domain, the error field results were:

[Errno 101] Network is unreachable, [Errno 101] Network is unreachable, [Errno 101] Network is unreachable, [Errno 101] Network is unreachable, [Errno 101] Network is unreachable

There actually was a network-related error. However, the error output is not clear as to the object whose network appears unreachable. (The domain scanned has 5 mail servers listed on their MX record, so I expect this error is in reference to those 5 hostnames.)

@@ -22,6 +23,14 @@
--dmarc Only check dmarc records
--json Output is in json format (default csv)
--debug Output should include error messages.
--dns-hostnames=HOSTNAMES A comma-delimited list of DNS servers to query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming dns-hostnames just dns. In most instances, IPs will be entered, not actual hostnames, and dns is faster to type on the command-line.

# virginia.edu resolves to until we resolve why Google
# DNS does not return the same PTR records as the CAL
# DNS does for 64.69.57.18.
query = spf.query("128.143.22.36", "email_wizard@" + domain.domain_name, domain.domain_name, strict=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm interested in the long-term solution to this, given @dhs-ncats' intent to craft tools to be reusable outside of our specific needs. I wonder if setting an IP-with-a-valid-PTR-record-that-doesn't-send-mail needs to be something a user supplies at the command line or as an environmental variable in order to run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other folks can use our IP-with-a-valid-PTR-record-that-doesn't-send-mail once the wrinkles are ironed out with our DNS. The IP I was going to use does have a valid PTR record (although it isn't publicly available yet) and does not send mail.

Having such an IP as a precondition for using trustymail seems to put a lot of burden on the user. I don't personally have such an IP, for example, because I don't own any static IPs. I would have to find one.

@h-m-f-t h-m-f-t merged commit 2ea2094 into master Nov 22, 2017
@h-m-f-t h-m-f-t deleted the bugfix/not_always_idempotent branch November 22, 2017 19:24
@IanLee1521
Copy link
Collaborator

@jsf9k -- A recommendation for the future, might be good to prefix the pull request name with WIP: ... to denote that it's a work in progress: https://ben.balter.com/2015/12/08/types-of-pull-requests/

@jsf9k
Copy link
Member Author

jsf9k commented Dec 4, 2017

I'll do that. Thanks.

@jayvdb jayvdb mentioned this pull request Jan 25, 2020
mcdonnnj pushed a commit that referenced this pull request Jan 23, 2023
mcdonnnj pushed a commit that referenced this pull request Jan 23, 2023
Add codeowners file with team OIS maintainers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants