-
Notifications
You must be signed in to change notification settings - Fork 36.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove bitnodes.io from dnsseeds. #5545
Conversation
I'm not comfortable with retaining this entry.
Hi Greg, care to elaborate more? I notice it could be the surge of few getaddr.bitnodes.io crawlers from AWS but they are not mine though: The crawler for the Bitnodes project should only come from 148.251.238.178. |
Please explain to me why 148.251.238.178 is still staying pinned up to every node in the network? It was my understanding that after the last round of concerns you said you would discontinue that behaviour. Are you saying you have no other hosts connecting outbound at large scale to the network than 148.251.238.178? |
I only operate Bitnodes crawler from 148.251.238.178. I am not sure about the AWS ones; anyone could run the crawler using the same user agent: https://github.com/ayeowch/bitnodes/blob/master/protocol.py#L103 At the moment, my crawler does keep at most 1 active connection with each reachable node to aggregate the metrics for https://getaddr.bitnodes.io/dashboard/. It keeps the bandwidth usage to minimal without sending further data after a handshake except for periodical ping/pong messages. You will see an inbound from 148.251.238.178: |
Um, if bitnodes is maintaining a connection to every peer they can, then ACK ACK this PR... |
@luke-jr Most seeders, e.g. /bitcoinseeder:0.01/, enumerate all reachable nodes periodically anyway to get a snapshot of the network. Bitnodes simply use this data to come up with the network metrics to allow us to know the state/size of the network at any one time. I will ACK this PR anyway since it has come up again and again. It's quite silly. |
There's a big difference from crawling the network periodically and maintaining a constant connection. You'll note my DNS seed provides the same network metrics (longer than you have, actually) without keeping any persistent connections... |
@luke-jr My background is in networking so I may be looking at different perspective as I imagine it is inefficient to disconnect from nodes that you are already connected to if you are going to connect to them again after hearing about them from addr response from a peer in the next crawl. |
@ayeowch The problem of staying connected is that you're holding up a connection slot permanently, that otherwise could be used by a node or SPV wallet. It is better-behaved to poll once in a reasonable while, request e.g. the version and immediately disconnect. |
@ayeowch You're right that there is a certain overhead involved in connecting and disconnecting from a peer, but that's almost nothing compared to the resources you're requiring from them for staying connected (a few MB of RAM per connection, plus all CPU involved in sending and tracking advertized blocks and transactions). In addition, as @laanwj already mentions: connection slots are limited resource that is valuable for the network. Please don't waste it. |
ACK, based on criteria provided. |
@laanwj @sipa Thanks for the valuable input. I had in the past configured the crawler to connect and disconnect after a full handshake. This results in the crawler having to reconnect after a full snapshot of the network is taken. This makes the crawler appear abusive in nature to some users. @sipa I have monitored the transferred bytes closely between the crawler and each node but didn't realize the undesirable RAM/CPU usage coming from it just to maintain the connection. I am going to revisit my approach again in the next couple of days and at the very least revert the crawler back to its previous behaviour, i.e. connect and disconnect from each node to take snapshot of the network. |
ACK for now, but NACK if problems are fixed in the next few days. |
@andyeow Thanks for considering to change your approach. Let's wait a few days before merging. |
I will continue to be uncomfortable if the behaviour is changed. I really don't want to get into a debate here but we've had several instances where the behaviour of the bitnodes crawlers was bad in a way which was very obvious all (or nearly all) of the regular contributors to the project. In each case there was some discussion and at least I left with the impression that the behaviour would be fixed. Then later we find it doing something substantially similar, not changed, or even worse. Clearly there are some consistent problems with communication and/or understanding. In the case of the dnsseed itself, much of its behaviour is even more difficult to monitor or audit than the crawler. Plus, we have no great and urgent need for more dnsseeds, in fact our reliance on them is lower than it was in the past. With that in mind, I'd rather not waste my cycles trying to figure out ways to determine if this seed is behaving correctly, and I suspect that Ayeowch would probably rather not spend time defending the seed service which is nothing but costs to him, especially with us being demanding and seemingly unappreciative. I'm currently working on tools that will help protect the users/network from abusive or intrusive behaviour like we've seen from this crawler software, regardless of well things are or aren't communicated or what people's intentions are, so I do not consider the behaviour of those particular hosts gating here (though it should also be improved), but a rather just as a sign that there is some kind of persistent serious misunderstanding and miscommunication and that I'd rather not have a DNS seed operating under that. Cheers. |
@gmaxwell Well said Greg :) I am glad that you already have work underway to better establish what is considered abusive/non-abusive towards the network. As much as I wanted to defend myself and certainly want to help with the network by offering the DNS seeder, I agree much of the problem with the crawler was due to miscommunication. English is not my first language so that could be another reason? I'm still working on the crawler right now to avoid keeping connection. The only immediate gating issue for me is on the node alert feature on the Bitnodes site. The feature currently sends an alert to subscribed node owner as soon as it is disconnected from the crawler, so I will have to think of another way to do this. If you have thoughts about this or any issues regarding the crawler, please feel free to let me know (ayeowch@gmail.com). At any rate, I am now less than comfortable having my DNS seeder in the reference implementation so let's proceed to ACK the removal for now. |
I think it's probably fair to take up a connection slot on a node that someone has manually requested monitoring of. |
Using the new peers view I see 10 incoming connections with the useragent bitnodes.io, that app needs fixing! Next to that, it makes no sense to have a crawler in the seed. They should find nodes, not the other way around. ACK. |
@zander I notice the 10 crawlers coming from 54.x.x.x too; they are not mine and does not contribute to stats for the project website at getaddr.bitnodes.io. I mentioned this in my second comment in this issue. I have reverted my crawler earlier today to maintain only 1 instance to perform a full handshake with a reachable node and disconnect immediately. It will however reconnect again after a full snapshot of the network is taken, i.e. approx. 5 to 8 minutes later. You should be able to see it appearing in your getpeerinfo as an inbound from 148.251.238.178. |
a094b3d Remove bitnodes.io from dnsseeds. (Gregory Maxwell)
I think I will be keeping Addy's seed in both bitcoinj and Bitcoin XT for a couple of reasons:
Addy, for what it's worth your English is excellent. I don't think the issue here is language related. |
I'm not comfortable with retaining this entry.