Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows DNS Server Order by Route Metrics #81

Merged
merged 2 commits into from Mar 13, 2017

Conversation

Projects
None yet
5 participants
@bradh352
Copy link
Member

commented Jan 10, 2017

Based off original patch submitted by Brad Spencer bspencer@blackberry.com via https://c-ares.haxx.se/mail/c-ares-archive-2016-04/0000.shtml

The original patch relied on Windows Vista-specific functions, I've made slight tweaks to load those functions dynamically for compatibility with older systems.

Below is the original submission comments:
On Windows, the c-ares DNS resolver tries first to get a full list of
DNS server addresses by enumerating the system's IPv4/v6 interfaces and
then getting the per-interface DNS server lists from those interfaces
and joining them together. The OS, at least in the way the c-ares
prefers to query them (which also may be the only or best way in some
environments), does not provide a unified list of DNS servers ordered
according to "current network conditions". Currently, c-ares will then
try to use them in whatever order the nested enumeration produces, which
may result in DNS requests being sent to servers on one interface
(hosting the current default route, for example) that are only intended
to be used via another interface (intended to be used when the first
interface is not available, for example). This, in turn, can lead to
spurious failures and timeouts simply because of the server address
order that resulted because of the enumeration process.

This patch makes the (safe?) assumption that there is no other better
rule to chose which interface's DNS server list should be prioritized.
After all, a DNS lookup isn't something "per network"; applications
don't look up "these DNS names on this interface and those DNS names on
that interface". There is a single resource pool of DNS servers and the
application should presume that any server will give it the "right"
answer. However, even if all DNS servers are assumed to give equally
useful responses, it is reasonable to expect that some DNS servers will
not accept requests on all interfaces. This patch avoids the problem by
sorting the DNS server addresses using the Windows IPv4/v6 routing tables.

For example, a request to DNS server C on interface 2 that is actually
sent over interface 1 (which may happen to have the default route) may
be rejected by or not delivered to DNS server C. So, better to use DNS
servers A and B associated with interface 1, at least as a first try.

By using the metric of the route to the DNS server itself as a proxy for
priority of the DNS server in the list, this patch is able to adapt
dynamically to changes in the interface list, the DNS server lists per
interface, which interfaces are active, the routing table, and so on,
while always picking a good "best" DNS server first.

In cases where any DNS server on any interface will do, this patch still
seems useful because it will prioritize a lower-metric route's (and thus
interface's) servers.

@coveralls

This comment has been minimized.

Copy link

commented Jan 10, 2017

Coverage Status

Coverage remained the same at 95.421% when pulling bf4a74b on bradh352:master into 4d56128 on c-ares:master.

@coveralls

This comment has been minimized.

Copy link

commented Jan 10, 2017

Coverage Status

Coverage remained the same at 95.421% when pulling b584008 on bradh352:master into 4d56128 on c-ares:master.

@coveralls

This comment has been minimized.

Copy link

commented Jan 10, 2017

Coverage Status

Coverage remained the same at 95.421% when pulling b584008 on bradh352:master into 4d56128 on c-ares:master.

@bradh352

This comment has been minimized.

Copy link
Member Author

commented Jan 10, 2017

@bagder Is AppVeyor tied into pull requests? If so, how do I view the results?

@bagder

This comment has been minimized.

Copy link
Member

commented Jan 10, 2017

AppVeyor is unfortunately not used by c-ares at all. For the very silly reason that I can't manage to actually add it. On the @appveyor site I have created an "cares-org" user and when I try to create a github project there, it offers me to use any of my other 30+ github repositories (even those under different organizations that I'm a member of) except the c-ares ones. But for all I know, the c-ares org allows appveyor just as well as the other orgs I'm in...

I suggest someone else tries to do it.

@daviddrysdale

This comment has been minimized.

Copy link
Member

commented Jan 10, 2017

My fork has AppVeyor integration enabled, so I pulled/pushed -- results at: https://ci.appveyor.com/project/daviddrysdale/c-ares/build/1.0.230

@bradh352

This comment has been minimized.

Copy link
Member Author

commented Jan 10, 2017

Thanks @daviddrysdale ! Can you re-pull? I fixed the issue, it was due to a Windows SDK prior to 8.1

@daviddrysdale

This comment has been minimized.

@coveralls

This comment has been minimized.

Copy link

commented Jan 10, 2017

Coverage Status

Coverage remained the same at 95.421% when pulling e5f196a on bradh352:master into 4d56128 on c-ares:master.

@bradh352

This comment has been minimized.

Copy link
Member Author

commented Jan 10, 2017

@daviddrysdale much better, just looks like the legacy mingw w32api is too old (predates Vista), I've updated the appveyor.yml to use mingw-w64 instead which is what pretty much everyone uses these days.

@coveralls

This comment has been minimized.

Copy link

commented Jan 10, 2017

Coverage Status

Coverage remained the same at 95.421% when pulling 8c7caa8 on bradh352:master into 4d56128 on c-ares:master.

@bradh352

This comment has been minimized.

Copy link
Member Author

commented Jan 10, 2017

Ugh, AppVeyor information is really hard to come by. It seems they name gcc just gcc, but make on the other hand is mingw32-make for some odd reason. @daviddrysdale any chance you could pull it again, hopefully it will just work.

@coveralls

This comment has been minimized.

Copy link

commented Jan 10, 2017

Coverage Status

Coverage remained the same at 95.421% when pulling 223f53f on bradh352:master into 4d56128 on c-ares:master.

@daviddrysdale

This comment has been minimized.

@bradh352

This comment has been minimized.

Copy link
Member Author

commented Jan 10, 2017

looks good to me, the errors its hitting are in the tests, which nothing there was modified, so not me!

thanks @daviddrysdale !

@bradh352

This comment has been minimized.

Copy link
Member Author

commented Jan 11, 2017

@daviddrysdale I went ahead and fixed the tests for windows in the pull request, it was fairly simple. (though not in any way related to the pull request).

However, when run, it produced 3 failures:

[ FAILED ] 3 tests, listed below:
[ FAILED ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/1, where GetParam() = "b"
[ FAILED ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/2, where GetParam() = "fb"
[ FAILED ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/3, where GetParam() = "bf"

I didn't research the issue any further.

@coveralls

This comment has been minimized.

Copy link

commented Jan 11, 2017

Coverage Status

Coverage remained the same at 95.421% when pulling 6532308 on bradh352:master into 4d56128 on c-ares:master.

@daviddrysdale

This comment has been minimized.

Copy link
Member

commented Jan 11, 2017

Thanks for the test fix; https://ci.appveyor.com/project/daviddrysdale/c-ares/build/1.0.237 builds OK.

I'm not sure what's up with the tests under MinGW -- I disabled them in AppVeyor a while ago when they failed there, but not locally for me.

@daviddrysdale

This comment has been minimized.

Copy link
Member

commented Jan 11, 2017

Just to be clear, we still need someone with Windows knowledge (i.e. not me) to look this over...

@bradh352

This comment has been minimized.

Copy link
Member Author

commented Jan 11, 2017

@daviddrysdale recent versions of Windows don't have an entry for localhost in the hosts file at all, so I think this is expected to fail on Windows for the file tests. Also, I'm not sure if all DNS servers will answer localhost with ::1 or 127.0.0.1 when queried. I think this test case itself might just be bad and not an actual c-ares failure (unless of course it would be desirable to internally provide localhost resolution if not in the hosts file).

Heres the output of vtest:
[ RUN ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/0
HostCallback({ARES_ECONNREFUSED {'' aliases=[] addrs=[]}})
[ OK ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/0 (0 ms)
[ RUN ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/1
HostCallback({ARES_SUCCESS {'localhost' aliases=[localhost] addrs=[]}})
ares-test-live.cc:98: Failure
Value of: (int)result.host_.addrs_.size()
Actual: 0
Expected: 1
[ FAILED ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/1, where GetParam() = "b" (32 ms)
[ RUN ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/2
HostCallback({ARES_SUCCESS {'localhost' aliases=[localhost] addrs=[]}})
ares-test-live.cc:98: Failure
Value of: (int)result.host_.addrs_.size()
Actual: 0
Expected: 1
[ FAILED ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/2, where GetParam() = "fb" (31 ms)
[ RUN ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/3
HostCallback({ARES_SUCCESS {'localhost' aliases=[localhost] addrs=[]}})
ares-test-live.cc:98: Failure
Value of: (int)result.host_.addrs_.size()
Actual: 0
Expected: 1
[ FAILED ] Modes/DefaultChannelModeTest.LiveGetLocalhostByNameV6/3, where GetParam() = "bf" (47 ms)

@daviddrysdale

This comment has been minimized.

Copy link
Member

commented Jan 12, 2017

@bradh352: good point, there's no guarantee that /etc/hosts will have a localhost entry. I've pulled this out into a separate issue (#85)

@bagder

This comment has been minimized.

Copy link
Member

commented Jan 12, 2017

I'm not sure if all DNS servers will answer localhost with ::1 or 127.0.0.1 when queried

I'm sure they won't. That's just another name to most DNS servers.

If I'm not mistaking, native name resolver functions in windows will however resolve "localhost" to 127.0.0.1 on their own, without it being present in their /etc/hosts equivalent. (apparently since Windows 7)

Windows DNS server sorting
Original Patch From Brad Spencer:
https://c-ares.haxx.se/mail/c-ares-archive-2016-04/0000.shtml

My modifications include:
 * Dynamically find GetBestRoute2 since it is a Windows Vista+ symbol, and will fall back to prior behavior when not available.
 * Prefer get_DNS_AdaptersAddresses as the modifications should alleviate the concerns which caused us to prefer get_DNS_NetworkParams
 * Update AppVeyor to use MinGW-w64 instead of the legacy MinGW
 * Fix compile error in test suite for Windows.

Original message from patch below:

From: Brad Spencer <bspencer@blackberry.com>
Date: Fri, 29 Apr 2016 14:26:23 -0300

On Windows, the c-ares DNS resolver tries first to get a full list of
DNS server addresses by enumerating the system's IPv4/v6 interfaces and
then getting the per-interface DNS server lists from those interfaces
and joining them together. The OS, at least in the way the c-ares
prefers to query them (which also may be the only or best way in some
environments), does not provide a unified list of DNS servers ordered
according to "current network conditions". Currently, c-ares will then
try to use them in whatever order the nested enumeration produces, which
may result in DNS requests being sent to servers on one interface
(hosting the current default route, for example) that are only intended
to be used via another interface (intended to be used when the first
interface is not available, for example). This, in turn, can lead to
spurious failures and timeouts simply because of the server address
order that resulted because of the enumeration process.

This patch makes the (safe?) assumption that there is no other better
rule to chose which interface's DNS server list should be prioritized.
After all, a DNS lookup isn't something "per network"; applications
don't look up "these DNS names on this interface and those DNS names on
that interface". There is a single resource pool of DNS servers and the
application should presume that any server will give it the "right"
answer. However, even if all DNS servers are assumed to give equally
useful responses, it is reasonable to expect that some DNS servers will
not accept requests on all interfaces. This patch avoids the problem by
sorting the DNS server addresses using the Windows IPv4/v6 routing tables.

For example, a request to DNS server C on interface 2 that is actually
sent over interface 1 (which may happen to have the default route) may
be rejected by or not delivered to DNS server C. So, better to use DNS
servers A and B associated with interface 1, at least as a first try.

By using the metric of the route to the DNS server itself as a proxy for
priority of the DNS server in the list, this patch is able to adapt
dynamically to changes in the interface list, the DNS server lists per
interface, which interfaces are active, the routing table, and so on,
while always picking a good "best" DNS server first.

In cases where any DNS server on any interface will do, this patch still
seems useful because it will prioritize a lower-metric route's (and thus
interface's) servers.
@bradh352

This comment has been minimized.

Copy link
Member Author

commented Jan 13, 2017

My pull request has now been squashed into 1 commit

@coveralls

This comment has been minimized.

Copy link

commented Jan 13, 2017

Coverage Status

Coverage remained the same at 95.421% when pulling 8f252b6 on bradh352:master into 0719277 on c-ares:master.

@b-spencer

This comment has been minimized.

Copy link
Contributor

commented Jan 20, 2017

Hi. I'm the original patch submitter. Sorry for the delay in responding.

@bradh352 , I reviewed the modified version of the changes and they look good to me, FWIW.

It'll be great if these get adopted. Thanks for following through where I couldn't!

@coveralls

This comment has been minimized.

Copy link

commented Feb 10, 2017

Coverage Status

Coverage remained the same at 95.425% when pulling c0bad21 on bradh352:master into 4a89071 on c-ares:master.

@bagder bagder merged commit 39aeafd into c-ares:master Mar 13, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 95.425%
Details
@bagder

This comment has been minimized.

Copy link
Member

commented Mar 13, 2017

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.