Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOOPIA: BUG: MX record integration tests failing #2218

Closed
systemcrash opened this issue Mar 23, 2023 · 9 comments
Closed

LOOPIA: BUG: MX record integration tests failing #2218

systemcrash opened this issue Mar 23, 2023 · 9 comments

Comments

@systemcrash
Copy link
Contributor

One test 馃煛

Both of those tests look like the same issue. It is trying to change "foo" to "foo.example.com.". My guess is somewhere there should be a loop that turns all the targets for MX, CNAME, NS (and possibly others) from shortnames to FQDNs.

Let me know how I can help!

Originally posted by @tlimoncelli in #2214 (comment)

@systemcrash
Copy link
Contributor Author

Not sure where to dig for this problem - if your gnome senses know where to zoom in, that'd be a help.

I thought I took care of all correction loops. 鈽癸笍

@tlimoncelli tlimoncelli changed the title One test 馃煛 LOOPIA: BUG: MX record integration tests failing Mar 23, 2023
@tlimoncelli
Copy link
Contributor

Does this test fail if you run the tests on the master branch? If so, it's a global problem that I should fix the the core code.

If it just happens on the v3.30.0test tag, then this is a problem with the providers/loopia code. The fix might look like this:

  switch rtype := record.Type; rtype {
  case "CAA":
    err = rc.SetTargetCAAString(record.Rdata)
  case "MX":
    //err = rc.SetTargetMX(record.Priority, record.Rdata)
    err = rc.SetTargetMX(record.Priority, dnsutil.AddOrigin(record.Rdata, origin))

HTH

@systemcrash
Copy link
Contributor Author

systemcrash commented Mar 23, 2023 via email

@systemcrash
Copy link
Contributor Author

Problem seems more global. I did:

git checkout 8e643c28564b9010bf57e4b6dc2d1aafc1dbe47a
git cherry-pick 178ab4635cf20493f1d5c57ef1f6530ddbabf316 --no-commit
cd integrationTest/
go test -v -verbose -provider LOOPIA -start 18 -end 18

=== RUN   TestDNSProviders
=== RUN   TestDNSProviders/example.com
=== RUN   TestDNSProviders/example.com/Clean_Slate:Empty
=== RUN   TestDNSProviders/example.com/18:testByRecordSet:initial
    integration_test.go:223:
        + CREATE A bar.example.com 1.2.3.4 ttl=300
    integration_test.go:223:
        + CREATE A foo.example.com 2.3.4.5 ttl=300
    integration_test.go:223:
        + CREATE A foo.example.com 3.4.5.6 ttl=300
    integration_test.go:223:
        + CREATE MX foo.example.com 10 foo.example.com. ttl=300
    integration_test.go:223:
        + CREATE MX foo.example.com 20 bar.example.com. ttl=300
    integration_test.go:242: Expected 0 corrections on second run, but found 2.
    integration_test.go:244: UNEXPECTED #0: 卤 MODIFY MX foo.example.com: (10 foo ttl=300) -> (10 foo.example.com. ttl=300)
    integration_test.go:244: UNEXPECTED #1: 卤 MODIFY MX foo.example.com: (20 bar ttl=300) -> (20 bar.example.com. ttl=300)
=== RUN   TestDNSProviders/example.com/Post_cleanup:Empty
    integration_test.go:223:
        records affected by deletion of subdomain bar.example.com
        - DELETE A bar.example.com 1.2.3.4 ttl=300
    integration_test.go:223:
        records affected by deletion of subdomain foo.example.com
        - DELETE MX foo.example.com 10 foo ttl=300
        - DELETE MX foo.example.com 20 bar ttl=300
        - DELETE A foo.example.com 2.3.4.5 ttl=300
        - DELETE A foo.example.com 3.4.5.6 ttl=300
--- FAIL: TestDNSProviders (33.94s)
    --- FAIL: TestDNSProviders/example.com (33.94s)
        --- PASS: TestDNSProviders/example.com/Clean_Slate:Empty (2.47s)
        --- FAIL: TestDNSProviders/example.com/18:testByRecordSet:initial (23.75s)
        --- PASS: TestDNSProviders/example.com/Post_cleanup:Empty (7.44s)

if it helps you any, Loopia is not a 'by label' provider. They're a 'incremental-record' provider.

I don't mind that this happens and eventually succeeds, but if it's a fail, so be it. Let's fix it.

@systemcrash
Copy link
Contributor Author

Just for fun, I tried your proposed fix,

	case "MX":
		// err = rc.SetTargetMX(record.Priority, record.Rdata)
		err = rc.SetTargetMX(record.Priority, dnsutil.AddOrigin(record.Rdata, origin))

and the results are the same but the names are not just subdomain names, but fqdns without the dot:

=== RUN   TestDNSProviders
=== RUN   TestDNSProviders/example.com
=== RUN   TestDNSProviders/example.com/Clean_Slate:Empty
=== RUN   TestDNSProviders/example.com/18:testByRecordSet:initial
    integration_test.go:223:
        + CREATE A bar.example.com 1.2.3.4 ttl=300
    integration_test.go:223:
        + CREATE A foo.example.com 2.3.4.5 ttl=300
    integration_test.go:223:
        + CREATE A foo.example.com 3.4.5.6 ttl=300
    integration_test.go:223:
        + CREATE MX foo.example.com 10 foo.example.com. ttl=300
    integration_test.go:223:
        + CREATE MX foo.example.com 20 bar.example.com. ttl=300
    integration_test.go:242: Expected 0 corrections on second run, but found 2.
    integration_test.go:244: UNEXPECTED #0: 卤 MODIFY MX foo.example.com: (10 foo.example.com ttl=300) -> (10 foo.example.com. ttl=300)
    integration_test.go:244: UNEXPECTED #1: 卤 MODIFY MX foo.example.com: (20 bar.example.com ttl=300) -> (20 bar.example.com. ttl=300)
=== RUN   TestDNSProviders/example.com/Post_cleanup:Empty
    integration_test.go:223:
        records affected by deletion of subdomain bar.example.com
        - DELETE A bar.example.com 1.2.3.4 ttl=300
    integration_test.go:223:
        records affected by deletion of subdomain foo.example.com
        - DELETE A foo.example.com 2.3.4.5 ttl=300
        - DELETE A foo.example.com 3.4.5.6 ttl=300
        - DELETE MX foo.example.com 10 foo.example.com ttl=300
        - DELETE MX foo.example.com 20 bar.example.com ttl=300
--- FAIL: TestDNSProviders (24.68s)
    --- FAIL: TestDNSProviders/example.com (24.68s)
        --- PASS: TestDNSProviders/example.com/Clean_Slate:Empty (2.47s)
        --- FAIL: TestDNSProviders/example.com/18:testByRecordSet:initial (13.62s)
        --- PASS: TestDNSProviders/example.com/Post_cleanup:Empty (8.16s)

Fail. Then I tweaked your fix as a hack (I don't fully understand the implications of what I did) to;

	case "MX":
		// err = rc.SetTargetMX(record.Priority, record.Rdata)
		err = rc.SetTargetMX(record.Priority, dnsutil.AddOrigin(record.Rdata, origin) + ".")

and it passed:

=== RUN   TestDNSProviders
=== RUN   TestDNSProviders/example.com
=== RUN   TestDNSProviders/example.com/Clean_Slate:Empty
=== RUN   TestDNSProviders/example.com/18:testByRecordSet:initial
    integration_test.go:223:
        + CREATE A bar.example.com 1.2.3.4 ttl=300
    integration_test.go:223:
        + CREATE A foo.example.com 2.3.4.5 ttl=300
    integration_test.go:223:
        + CREATE A foo.example.com 3.4.5.6 ttl=300
    integration_test.go:223:
        + CREATE MX foo.example.com 10 foo.example.com. ttl=300
    integration_test.go:223:
        + CREATE MX foo.example.com 20 bar.example.com. ttl=300
=== RUN   TestDNSProviders/example.com/18:testByRecordSet:changeOne
    integration_test.go:223:
        卤 MODIFY A foo.example.com: (3.4.5.6 ttl=300) -> (8.8.8.8 ttl=300)
=== RUN   TestDNSProviders/example.com/18:testByRecordSet:deleteOne
    integration_test.go:223:
        - DELETE A foo.example.com 8.8.8.8 ttl=300
=== RUN   TestDNSProviders/example.com/18:testByRecordSet:addOne
    integration_test.go:223:
        + CREATE A foo.example.com 8.8.8.8 ttl=300
=== RUN   TestDNSProviders/example.com/Post_cleanup:Empty
    integration_test.go:223:
        records affected by deletion of subdomain foo.example.com
        - DELETE A foo.example.com 2.3.4.5 ttl=300
        - DELETE A foo.example.com 8.8.8.8 ttl=300
        - DELETE MX foo.example.com 10 foo.example.com. ttl=300
        - DELETE MX foo.example.com 20 bar.example.com. ttl=300
    integration_test.go:223:
        records affected by deletion of subdomain bar.example.com
        - DELETE A bar.example.com 1.2.3.4 ttl=300
--- PASS: TestDNSProviders (57.27s)
    --- PASS: TestDNSProviders/example.com (57.27s)
        --- PASS: TestDNSProviders/example.com/Clean_Slate:Empty (2.47s)
        --- PASS: TestDNSProviders/example.com/18:testByRecordSet:initial (13.63s)
        --- PASS: TestDNSProviders/example.com/18:testByRecordSet:changeOne (11.13s)
        --- PASS: TestDNSProviders/example.com/18:testByRecordSet:deleteOne (11.12s)
        --- PASS: TestDNSProviders/example.com/18:testByRecordSet:addOne (11.15s)
        --- PASS: TestDNSProviders/example.com/Post_cleanup:Empty (7.43s)

@tlimoncelli
Copy link
Contributor

Perfect!

Could you send a PR, please?

@systemcrash
Copy link
Contributor Author

Sure - although does your code still have agency if I just tack on a "." there?

And isn't there a smarter helper function to add a "." (one that verifies if one doesn't already exist)?

@tlimoncelli
Copy link
Contributor

Sure - although does your code still have agency if I just tack on a "." there?

After a call to AddOrigin it is always safe to add a ".".

And isn't there a smarter helper function to add a "." (one that verifies if one doesn't already exist)?

Nope. variable + "." is more readable and more correct.

You'd be surprised at how rarely "add a dot if it missing" conditional is needed. APIs are very consistent: they return a field that is either FQDN all the time or FQDN+dot all the time. I've never seen an API that says "sometimes you'll get a trailing dot, but not always". In all the years this project has been around, no API has made that kind of breaking change. Therefore there's no need for such a conditional when processing API-generated values. It would add unneeded complexity, and add to the testing burden.

On the other hand, humans are inconsistent. We need that kind of thing for data that comes from dnsconfig.js, but that is handled by DNSControl long before the provider gets access to the data. In fact, correcting what a human does is so complex that I had to write an article about the process: https://docs.dnscontrol.org/language-reference/why-the-dot

At this point I may not have convinced you, so let's look at what would happen if the API did change:

API included a dot, now it doesn't: You would think that having "if no trailing dot, add it" would future-proof the code and save the day, right? Nope. We can't be sure of that.

Why would the provider make such a change? They wouldn't make the change for no reason, why break their clients?
They're not going to change just to save a byte. Bandwidth is expensive but nobody is going to try to save one byte. However they might make such a change because they've decided to return shortnames when possible, and the "." indicates a FQDN is being returned. In that situation, our defensive code was the wrong thing to do. Now we're turning "foo" into "foo." when it should be "foo.domain.com".

API didn't include the dot, now it does: Why did the API change? Did they get a great discount from their ISP and decided to luxuriously start wasting bandwidth like a drunken sailor? Unlikely. Random change? Unlikely... why break clients for no reason? Was it because they're now sending short names and need to signal when a FQDN is being sent? Possibly. Now our future-proofing code is doing the wrong thing.

Better than "if no trailing dot, add it", would be "if no trailing dot, panic("the API did something unexpected"). And, while I like that kind of defensive data validation the risk does not outweigh the reward. Such a protocol change would break most of their users, so they're not going to make such a change without a major version change (semvar).

@systemcrash
Copy link
Contributor Author

systemcrash commented Mar 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants