Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove bitnodes.io from dnsseeds. #5545

Merged
merged 1 commit into from
Dec 31, 2014
Merged

Remove bitnodes.io from dnsseeds. #5545

merged 1 commit into from
Dec 31, 2014

Conversation

gmaxwell
Copy link
Contributor

I'm not comfortable with retaining this entry.

I'm not comfortable with retaining this entry.
@gmaxwell gmaxwell added the P2P label Dec 27, 2014
@gmaxwell gmaxwell added this to the 0.10.0 milestone Dec 27, 2014
@ayeowch
Copy link
Contributor

ayeowch commented Dec 27, 2014

Hi Greg, care to elaborate more? I notice it could be the surge of few getaddr.bitnodes.io crawlers from AWS but they are not mine though:
54.194.231.211
54.183.64.54
54.174.10.182
54.66.214.167
54.94.200.247
54.67.33.14
54.77.251.214
54.66.220.137
54.173.72.127
54.94.195.96

The crawler for the Bitnodes project should only come from 148.251.238.178.

@gmaxwell
Copy link
Contributor Author

Please explain to me why 148.251.238.178 is still staying pinned up to every node in the network? It was my understanding that after the last round of concerns you said you would discontinue that behaviour.

Are you saying you have no other hosts connecting outbound at large scale to the network than 148.251.238.178?

@ayeowch
Copy link
Contributor

ayeowch commented Dec 27, 2014

I only operate Bitnodes crawler from 148.251.238.178. I am not sure about the AWS ones; anyone could run the crawler using the same user agent: https://github.com/ayeowch/bitnodes/blob/master/protocol.py#L103

At the moment, my crawler does keep at most 1 active connection with each reachable node to aggregate the metrics for https://getaddr.bitnodes.io/dashboard/. It keeps the bandwidth usage to minimal without sending further data after a handshake except for periodical ping/pong messages. You will see an inbound from 148.251.238.178:
{
"addr" : "148.251.238.178:27426",
"addrlocal" : "",
"services" : "00000001",
"lastsend" : 1419674287,
"lastrecv" : 1419674209,
"bytessent" : 69195478,
"bytesrecv" : 155751,
"conntime" : 1418186362,
"pingtime" : 0.00000000,
"version" : 70001,
"subver" : "/getaddr.bitnodes.io:0.1/",
"inbound" : true,
"startingheight" : 333662,
"banscore" : 0,
"syncnode" : true
}

@luke-jr
Copy link
Member

luke-jr commented Dec 27, 2014

Um, if bitnodes is maintaining a connection to every peer they can, then ACK ACK this PR...

@ayeowch
Copy link
Contributor

ayeowch commented Dec 27, 2014

@luke-jr Most seeders, e.g. /bitcoinseeder:0.01/, enumerate all reachable nodes periodically anyway to get a snapshot of the network. Bitnodes simply use this data to come up with the network metrics to allow us to know the state/size of the network at any one time. I will ACK this PR anyway since it has come up again and again. It's quite silly.

@luke-jr
Copy link
Member

luke-jr commented Dec 27, 2014

There's a big difference from crawling the network periodically and maintaining a constant connection. You'll note my DNS seed provides the same network metrics (longer than you have, actually) without keeping any persistent connections...

@ayeowch
Copy link
Contributor

ayeowch commented Dec 27, 2014

@luke-jr My background is in networking so I may be looking at different perspective as I imagine it is inefficient to disconnect from nodes that you are already connected to if you are going to connect to them again after hearing about them from addr response from a peer in the next crawl.

@laanwj
Copy link
Member

laanwj commented Dec 27, 2014

@ayeowch The problem of staying connected is that you're holding up a connection slot permanently, that otherwise could be used by a node or SPV wallet. It is better-behaved to poll once in a reasonable while, request e.g. the version and immediately disconnect.

@sipa
Copy link
Member

sipa commented Dec 27, 2014

@ayeowch You're right that there is a certain overhead involved in connecting and disconnecting from a peer, but that's almost nothing compared to the resources you're requiring from them for staying connected (a few MB of RAM per connection, plus all CPU involved in sending and tracking advertized blocks and transactions). In addition, as @laanwj already mentions: connection slots are limited resource that is valuable for the network. Please don't waste it.

@jgarzik
Copy link
Contributor

jgarzik commented Dec 27, 2014

ACK, based on criteria provided.

@ayeowch
Copy link
Contributor

ayeowch commented Dec 27, 2014

@laanwj @sipa Thanks for the valuable input. I had in the past configured the crawler to connect and disconnect after a full handshake. This results in the crawler having to reconnect after a full snapshot of the network is taken. This makes the crawler appear abusive in nature to some users.

@sipa I have monitored the transferred bytes closely between the crawler and each node but didn't realize the undesirable RAM/CPU usage coming from it just to maintain the connection.

I am going to revisit my approach again in the next couple of days and at the very least revert the crawler back to its previous behaviour, i.e. connect and disconnect from each node to take snapshot of the network.

@petertodd
Copy link
Contributor

ACK for now, but NACK if problems are fixed in the next few days.

@sipa
Copy link
Member

sipa commented Dec 28, 2014

@andyeow Thanks for considering to change your approach.

Let's wait a few days before merging.

@gmaxwell
Copy link
Contributor Author

I will continue to be uncomfortable if the behaviour is changed.

I really don't want to get into a debate here but we've had several instances where the behaviour of the bitnodes crawlers was bad in a way which was very obvious all (or nearly all) of the regular contributors to the project. In each case there was some discussion and at least I left with the impression that the behaviour would be fixed. Then later we find it doing something substantially similar, not changed, or even worse.

Clearly there are some consistent problems with communication and/or understanding.

In the case of the dnsseed itself, much of its behaviour is even more difficult to monitor or audit than the crawler. Plus, we have no great and urgent need for more dnsseeds, in fact our reliance on them is lower than it was in the past. With that in mind, I'd rather not waste my cycles trying to figure out ways to determine if this seed is behaving correctly, and I suspect that Ayeowch would probably rather not spend time defending the seed service which is nothing but costs to him, especially with us being demanding and seemingly unappreciative.

I'm currently working on tools that will help protect the users/network from abusive or intrusive behaviour like we've seen from this crawler software, regardless of well things are or aren't communicated or what people's intentions are, so I do not consider the behaviour of those particular hosts gating here (though it should also be improved), but a rather just as a sign that there is some kind of persistent serious misunderstanding and miscommunication and that I'd rather not have a DNS seed operating under that.

Cheers.

@ayeowch
Copy link
Contributor

ayeowch commented Dec 28, 2014

@gmaxwell Well said Greg :) I am glad that you already have work underway to better establish what is considered abusive/non-abusive towards the network.

As much as I wanted to defend myself and certainly want to help with the network by offering the DNS seeder, I agree much of the problem with the crawler was due to miscommunication. English is not my first language so that could be another reason? I'm still working on the crawler right now to avoid keeping connection. The only immediate gating issue for me is on the node alert feature on the Bitnodes site. The feature currently sends an alert to subscribed node owner as soon as it is disconnected from the crawler, so I will have to think of another way to do this. If you have thoughts about this or any issues regarding the crawler, please feel free to let me know (ayeowch@gmail.com).

At any rate, I am now less than comfortable having my DNS seeder in the reference implementation so let's proceed to ACK the removal for now.

@luke-jr
Copy link
Member

luke-jr commented Dec 28, 2014

I think it's probably fair to take up a connection slot on a node that someone has manually requested monitoring of.

@zander
Copy link

zander commented Dec 29, 2014

Using the new peers view I see 10 incoming connections with the useragent bitnodes.io, that app needs fixing!

Next to that, it makes no sense to have a crawler in the seed. They should find nodes, not the other way around.

ACK.

@ayeowch
Copy link
Contributor

ayeowch commented Dec 29, 2014

@zander I notice the 10 crawlers coming from 54.x.x.x too; they are not mine and does not contribute to stats for the project website at getaddr.bitnodes.io. I mentioned this in my second comment in this issue. I have reverted my crawler earlier today to maintain only 1 instance to perform a full handshake with a reachable node and disconnect immediately. It will however reconnect again after a full snapshot of the network is taken, i.e. approx. 5 to 8 minutes later. You should be able to see it appearing in your getpeerinfo as an inbound from 148.251.238.178.

@laanwj laanwj merged commit a094b3d into bitcoin:master Dec 31, 2014
laanwj added a commit that referenced this pull request Dec 31, 2014
a094b3d Remove bitnodes.io from dnsseeds. (Gregory Maxwell)
gmaxwell added a commit that referenced this pull request Dec 31, 2014
I'm not comfortable with retaining this entry.

Rebased-From: a094b3d
Github-Pull: #5545
@mikehearn
Copy link
Contributor

I think I will be keeping Addy's seed in both bitcoinj and Bitcoin XT for a couple of reasons:

  1. I don't see how removing a seed helps Bitcoin users. Ignoring for a moment that several popular services hold open lots of connections to the p2p network, the fix for this is discussion, dialog and new features, not reducing startup redundancy in Bitcoin Core.

  2. There's a written DNS seed policy, which currently says nothing about the behaviour of the associated crawlers. Rather than just targeting Addy's service specifically if this is really the highest priority it'd be better to come up with a specific policy and provide notice to seed operators, time for transition etc.

  3. Even if we accept the rationale that removing seeds as 'punishment' is a good way to effect change, Addy has now modified his crawler but the commit was merged anyway. Why?

Addy, for what it's worth your English is excellent. I don't think the issue here is language related.

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants