-
Notifications
You must be signed in to change notification settings - Fork 36.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: use interruptible async getaddrinfo wrapper from libevent for DNS #27505
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Afaik we currently do not expose libevent on any public facing interface (e.g. p2p) and only use it for things that aren't supposed to be exposed (e.g. rpc or rest). By using it for DNS queries we would be changing that (e.g. a malicious DNS seeder) and I'm not sure if that is the best idea given that libevent is pretty archaic (I think the MSan, ASan, TSan and LSan failures in the CI are kind of proving my point). |
@dergoegge I think at least some of the CI failures are memory leaks from my code, I'm going to fix that. But I hear your point about the arcane library. Any suggestions? |
The issue seems somewhat stale so maybe no need to fix? Judging by your last comment, you were also not able to reproduce? (#16778 (comment)) |
I was able to somewhat reproduce -- by sending all DNS requests on my machine to a blackhole resolver, I observed a 30 second pause during shutdown. But I think that 30 seconds may be platform-specific, or whatever the local |
|
||
thread.join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this also wait indefinitely for getaddrinfo
to finish before the thread can join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 sec timeout waiting for the callback and then the loop will close. With my black hole dns resolver, the patch saves 30 seconds against master on the unit test in the second commit.
It took me some work to figure out that using ..._once() to start the event loop would allow me to quit the loop manually which allows join() to succeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(libevent does not call the platform getaddrinfo for dns requests, it has its own async methods)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libevent does not call the platform getaddrinfo for dns requests, it has its own async methods
That explains why it doesn't hang, thanks!
But it does seem like it uses getaddrinfo
in some cases: https://github.com/libevent/libevent/blob/75208132d5b7a8fff59ca3bf47253179ec314951/evutil.c#L1686
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think thats just to decode numeric addresses like "100.200.30.40" etc. It looked to me like if an actual DNS request is necessary it has its own methods to make reqests directly to the nameservers provided by the platform? When things get "serious":
https://github.com/libevent/libevent/blob/master/evdns.c#L5623
In general, I'm ~0 on leaning into further libevent usage, especially for something like this. The upstream code is currently not necessarily very well maintained or tested. There is also at least one open issue which reports |
As an alternative to this PR, could we give the DNS thread time to join cleanly (e.g. 5s) and if it doesn't we just detach it and let the OS handle the clean up? That risks memory leaks but that shouldn't really matter when the program is about to exit anyway. I would also be fine with not addressing this at all, because it seems like this only happens when a system generally fails to make DNS requests? In the absence of nice solutions, it seems like that isn't our problem and the user should fix their system instead. |
@fanquake @dergoegge good feedback, thanks. I'll look into a different approach moving the lib call There is another thought I had about name resolution moving forward: If bitcoin can make more interesting DNS queries (either using a library or just implementing some bare necessities) we could ask the DNS seeders for I2P and onion addresses as TXT records |
It's very unlikely we are going to add a new external dependency to make "interesting" DNS queries. |
Closing this now for alternative in #27557 which just calls |
Closes #16778
Bitcoin uses
getaddrinfo
to make DNS requests for DNS seed servers and for adding peers with-addnode
,-seednode
and-connect
. Depending on the platform this can be clunky and a system issue could prevent name resolution from completing at all, blocking the thread and in some cases preventing a clean shutdown.An attempt was made to switch to the asynchronous
getaddrinfo_a
in #4421 but that was reverted in #9229 after discovering that function has a segfault!Taking BlueMatt's suggestion in #10215 (comment), this PR modifies our
g_dns_lookup
function to useevdns_getaddrinfo()
from libevent. This is an asynchronous function but I've implemented it in a polling loop so it still blocks -- but now will timeout after 2 seconds.TODO:
interruptNet
nic.com
that test will fail the platform has no DNSFuture work:
TXT
records with onion addresses from our DNS seeders