-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate validator lists (VLs or UNLs) over the peer network: #3072
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #3072 +/- ##
===========================================
- Coverage 70.78% 70.55% -0.23%
===========================================
Files 692 689 -3
Lines 55244 54592 -652
===========================================
- Hits 39104 38517 -587
+ Misses 16140 16075 -65
Continue to review full report at Codecov.
|
How hard/easy is it to spam the network with a few GB/TB of validly signed VLs? |
This is a great question that underlies an important concern. The answer is "It's nigh impossible". First, a server will only ever relay lists from publishers that they trust, meaning that someone can't just generate a million keypairs and use them to sign and "publish" a million lists. Second, the server will only ever relay the latest list from a trusted publisher. So, if someone just sends old lists with a sequence number earlier than what the server considers to be "current" they won't be relayed. I believe that these two safeguards make this functionality unsuitable for spamming. |
A widely trusted VL publisher can attack the network that way though, right? Just by publishing a huge number of lists instead of actively malicioulsly publishing a list that grants them full control over the network. There might be legitimate use cases to publish a new list every ledger close for an automated VL provider (e.g. to know that the provider is still active and that it also is actively moitoring the network by including the latest ledger hash as extra data in the list). Also these lists can grow relatively large. Just to be clear: As I understand it, a server can receive lots of VLs from different issuers in lots of versions, but will only relay a single one (the most recent VL of the issuer it trusts) to its peers? This leaks private information (which VL is trusted by a server) and makes it very hard to even have a second VL provider in the same network using this mechanism, since they can only get propagated via direct connections instead of hopping over peers. I also have concerns around leaking private in-cluster VLs (if one operates several servers, but wants to retain control over their UNL, the best way currently is to set up a private VL for that cluster and have the cluster members trust it). Just turning off the endpoint doesn't seem to be enough to stop this information (if a VL is trusted and if yes: which exact version by which provider) from leaking right after connecting to a server. |
There are additional protections at the individual node level. The big one is that |
Are you talking about a scenario where the publisher creates a valid list, pushes it out through its node, then immediately creates another list, over and over? I believe that would quickly lead that publisher losing trust. If enough nodes stop trusting the publisher the damage may be contained, since those nodes will not relay. I also think the standard peer fee may kick the "source" node out eventually if the flooding is fast enough. I can see it being worthwhile to increase the peer fee for "success" to mitigate this scenario, too. Finally, as you mention yourself, there are a lot more effective ways for a trusted VL publisher to attack and damage the network. That said, I could see value in some kind of outgoing rate limit for well-behaved peers so that they won't relay for a publisher if they've relayed too recently.
Mostly correct. If a server trusts more than one VL, it will relay the most recent VL of all the issuers it trusts. But that's splitting hairs.
I could see value in a configuration option to disable sending VLs to peers if someone is concerned about their trust map or a private VL. I suspect a small minority of servers will use it, but even if it is popular, I bet having public hubs and other well-connect nodes doing the broadcast may be sufficient to keep the network healthy. |
I already described a valid (and maybe useful?) use case: Publish a VL for every closed ledger. Anyways, all in all it only would allow someone to shoot themselves into the foot, but others would just drop connections after a while if that node tries to send a few gigabytes of VLs upon connection and not propagate them. This still leaves my concern about VL centralization - how does this scheme act in the presence of several widely trusted VL publishers? After all, validations themselves are always forwarded (up to traffic limits etc.) - maybe recent VLs that have x% overlap (80%? 90%?) with a server's UNL also should be forwarded, even if they are not trusted? |
Correct: a server can receive lots of VLs from different issuers and multiple VLs from the same issuer but it will only ever forward the "most recent" VL from publishers that it trusts.
To be honest, I don't know that I consider that "private information" though, of course, your mileage may vary and that's just fine.
I find that scenario "broken" personally, but that's another topic entirely. A UNL publisher can already sorta-kinda do that: publish a list that expires 5 seconds into the future, thus forcing anyone that's trusting them to keep connecting to them to retrieve the list.
It makes no real difference, as far as I can tell. It can cause a slight, bounded increase in the bandwidth used by the overlay, but it offers a critical service to other servers in return and removes a potential point of centralization (a web server). If the bandwidth concerns are significant, we could always flip the model on its head by implementing a "request" model ("Hey, I'm looking for the UNL signed by In that case, servers could simply opt to cache and serve UNLs from publishers they don't trust as well.
A server only forwards validations from trusted validators; validations from untrusted validators aren't regularly forwarded. We can have a discussion on if and how untrusted validations should be forwarded but this PR isn't the right place for that. |
I think the request model might be better (in all cases except the first time this data is not needed anyways when connecting to a server), or a hybrid approach where servers only send the VL(s) + sequence number(s) they currently would offer instead of the full data all the time so it is easier to choose from where to request the latest one. By the way, since this changes the p2p protocol by adding a message type, should this change to |
I'm not sure that's a particularly useful use case, but even if someone did want to implement this, publishing a new VL every 3-4 seconds is not as bad as what I thought you were talking about, which is a scenario of many VLs per second. Even so, for that to be a problem, the publisher would have to be pretty widely trusted - else the VLs aren't going to propagate very far - and the way I see it, if such a publisher is widely trusted, then that's a significant chunk of the network saying that such behavior is desirable. |
One advantage of the push model is that VLs can potentially propagate as soon as they're updated and the first node grabs a copy. Also, the
I don't think so, since the behavior is backward compatible in the sense that any node receiving a VL that doesn't know about this message will just ignore it. |
This could also happen in the pull scenario by notifying peers about a new VL (unless VL versions are communicated only in the handshake) once a new one is detected.
If it runs PS: I'd really love to have an option to store VL versions semi-permanently, and as I already wrote in the Issue for this PR: Why are they not added to a database but dumped as JSON? If they got stored in e.g. |
I think that's a neat option for anyone interested in maintaining a "historical UNL" archive, but I'm not sure that there's much of a benefit in coding this functionality inside What do you believe are the advantages of having this functionality inside |
While you're right that a peer that receives this message will ignore it, I think that it's a good habit for us to get into the "bump up" the wire protocol version when changes are made to the wire protocol. I don't think any changes are needed now, but if this goes in after #3049, then we maybe want to gate this as an |
94ad750
to
932af66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job with these changes.
If I understand correctly, we'd now have four different ways to infuse validator list data into the server:
- http downloads from configured sites
- local file:// url "downloads"
- peer network messages
- locally cached data (from a prior successful "infusion")
That seems like a lot of ways to configure the list and I'm mildly concerned about all this complexity for what should be simple configuration data. I also feel like there is a certain amount of overlap between 2 and 4 since they are both local filesystem based.
Can someone explain the motivating factors behind this change? Do we fundamentally believe that http GET is an inappropriate way to retrieve this data? Do we need something more "lively" that can push out updates nearly immediately? If so, do we want the peer network to be that mechanism? Or perhaps we should allow websocket subcriptions to some sort of VL feed? Again, I'm unclear on the motivating requirements here, so my questions might be misguided.
If we do believe the peer network is the right mechanism for distributing this data, it seems like we should provide a way to publish directly onto the peer network and ditch the http downloads completely. To that end, imagine we provided an admin API to load validator lists. Simple external tools could then be used to fetch from the local filesystem, urls, gossip networks, config management systems, etc. -- and then would just post the data into rippled. That would move some of this complexity out of the core. The update/load API could also trigger publication onto the peer network presumably if that's what we want.
src/ripple/app/main/Application.cpp
Outdated
add (*overlay_); // add to PropertyStream | ||
|
||
validators_->setMessageCallback( | ||
[&](uint256 const& hash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this particular bit of lambdas within lambdas is pretty hard to read. I normally wouldn't comment directly on formatting, but in this case it's almost inscrutable. To that end, I'd consider (1) letting ClangFormat reformat this particular block (I tried it and it looked slightly better to me), and (2) a few well-named local variables within the block might help de-obfuscate some of what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewritten to use local variables, and to get rid of the callbacks entirely.
src/ripple/app/main/Application.cpp
Outdated
overlay_->foreach_mutable(send_and_modify_if_not( | ||
std::make_shared<Message>(msg, protocol::mtVALIDATORLIST), | ||
peer_in_set_or_condition(*toSkip, | ||
[&](std::shared_ptr<Peer> const& peer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the capture list could be explicit here (just publisherKey
and sequence
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/ripple/app/misc/ValidatorList.h
Outdated
protocol::TMValidatorList const& msg, | ||
PublicKey const& publisherKey, | ||
size_t sequence)> sendMessage_; | ||
std::function<void(std::vector<std::string> const& sites)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort of understand the need for the sendMessage
callback, but it looks to me like loadValidatorSites_
is only used by loadLists()
which is itself only ever called by ValidatorSites
. Why not either (1) just have ValidatorSites
pass the callback when in calls loadLists()
or, better yet (2) just have ValidatorSites do whatever it needs to immediately after invoking ValidatorList::loadLists
(since the callback is tail-called anyhow). Seems to me that would be simpler to understand...and I dislike callbacks for stuff that can be invoked directly by name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspired by this comment, I completely removed the callbacks.
std::string siteUri, | ||
uint256 const& hash) | ||
{ | ||
auto const result = applyList(manifest, blob, signature, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could std::move
the siteUri
(forwarded) argument here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
*/ | ||
struct PublisherListStats | ||
{ | ||
explicit PublisherListStats(ListDisposition d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this class is now a kinda-sorta union type where the new fields are only valid for certain values of disposition (I think). I realize it's just a return-value class, but this feels a little unsafe. maybe make those optional fields optional
types - then we'll throw if someone tries to use them when they are unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. Done.
@@ -309,7 +370,101 @@ ValidatorList::applyList ( | |||
} | |||
} | |||
|
|||
return ListDisposition::accepted; | |||
// Cache the validator list in a file | |||
if (!dataPath_.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: the "write info to a cache file" is sufficiently stand-alone that I'd prefer to see it as its own function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
dataPath_ / (filePrefix_ + strHex(pubKey)) }; | ||
|
||
auto const fullPath{ canonical(filename, ec) }; | ||
if (!ec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we're not really doing anything with ec
here, so I think it might be slightly simpler to just do
if (ec)
continue;
in a few places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, make sense. Done.
} | ||
|
||
// Then let the ValidatorSites do the rest of the work. | ||
if (!sites.empty() && loadValidatorSites_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I mentioned above, I'd be interested to know if we can just let ValidatorSites take care of this callback functionality since it's always the one calling this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/ripple/app/main/Application.cpp
Outdated
@@ -571,6 +573,12 @@ class ApplicationImp | |||
m_nodeStoreScheduler.setJobQueue (*m_jobQueue); | |||
|
|||
add (m_ledgerMaster->getPropertySource ()); | |||
|
|||
validators_->setLoadSitesCallback( | |||
[&](std::vector<std::string> const& sites) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we capture validatorSites_
explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer relevant
src/ripple/app/main/Application.cpp
Outdated
protocol::TMValidatorList const& msg, | ||
PublicKey const& publisherKey, | ||
size_t sequence) { | ||
assert(overlay_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only need these two members (overlay_
, hashRouter_
), maybe capture them explicitly too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer relevant
Thank you. Great feedback here, too.
To split hairs, 4 uses 2, so I'd kinda count those as one item.
IIRC, the original motivation was back when there were several users who were having an extremely difficult time retrieving the list. Even though we were able to fix or work around many of those causes, my understanding is that there are still risks for new nodes or network changes on existing nodes to still make it difficult to get the list. For example, badly configured firewalls, limited network censorship. Additionally, this helps mitigate the risk of a web server, including the one operated by Ripple at vl.ripple.com, becoming a single point of failure. If a server goes down (or gets attacked at an inopportune time), the nodes using that VL risk not being able to make forward progress. This new feature gives the publisher the opportunity to provide an alternate path to publish - they could inject the VL into one of their own nodes via a local file, and get it distributed. However, that method is not guaranteed to reach every other node using that VL. Sure, for any server in the default config, that probably won't be a problem because a node using it is likely to be highly connect to other nodes using it, but the consumers of a less popular publisher are less likely to have a direct path to one of those nodes, and may need to use GET to make that leap. Another way to look at it: belt and suspenders.
No. I think GET is still appropriate, at least for now. However, it does make it optional. In my testing, I set up two nodes in each other's
I think these are good questions, so don't worry about that. I don't think liveliness is required right now, but I do think that it may enable some interesting options in the future. I don't think websocket subscriptions from node to node are necessarily going to be more useful than GET, but it is worth thinking about for the future.
I can see an advantage to loading via API in the future, but I think |
932af66
to
c638f96
Compare
c57bfce
to
31623ce
Compare
I'm making this PR a dependent of #3049 so I don't have to reinvent the wheel with the protocol change. As of this moment, all I've done is rebase on @nbougalis 's branch and fix conflicts. I'll be adding the |
32332d3
to
bc4797e
Compare
fwiw here's an ancient discussion of forwarding lists: #1842 (comment) |
bc4797e
to
4be4b0b
Compare
4be4b0b
to
7f3a114
Compare
@wilsonianb Thanks for the pointer. I read over the discussion and code changes and other than a little bit of "I could have saved a chunk of time if I had seen this sooner", I don't see any fundamental / breaking differences between the two implementations. If I missed anything, I'd be happy to look closer. |
a2f3736
to
bf64932
Compare
Rebased on to 1.5.0-b3 and resolved conflicts. |
bf64932
to
209035b
Compare
Squashed and rebased on to 1.5.0-b4. |
Returns local validator details and specified manifest information respectively. Folded and rebased on latest develop
account_info, owner_info, account_currencies
* Fixes RIPD-1759
* When an unknown amendment reaches majority, log an error-level message, and return a `warnings` array on all successful admin-level RPC calls to `server_info` and `server_state` with a message describing the problem, and the expected deadline. * In addition to the `amendment_blocked` flag returned by `server_info` and `server_state`, return a warning with a more verbose description when the server is amendment blocked. * Check on every flag ledger to see if the amendment(s) lose majority. Logs again if they don't, resumes normal operations if they did. The intention is to give operators earlier warning that their instances are in danger of being amendment blocked, which will hopefully motivate them to update ahead of time.
* Example: gcc.Debug will use the the default version of gcc installed on the system. gcc-9.Debug will use version 9, regardless of the default. This will be most useful when the default is older than required or desired.
* Metrics are now exported over insight. * Fixes a minor bug that affected the reporting of gauges
* update EP and find package requirements * minor protobuf/libarchive build fixes * change travis release builds to nounity to ameliorate vm memory exhaustion. FIXES: XRPLF#3223, XRPLF#3232
209035b
to
3249480
Compare
* Whenever a node downloads a new VL, send it to all peers that haven't already sent or received it. It also saves it to the database_dir as a Json text file named "cache." plus the public key of the list signer. Any files that exist for public keys provided in [validator_list_keys] will be loaded and processed if any download from [validator_list_sites] fails or no [validator_list_sites] are configured. * Whenever a node receives a broadcast VL message, it treats it as if it had downloaded it on it's own, broadcasting to other peers as described above. * Because nodes normally download the VL once every 5 minutes, a single node downloading a VL with an updated sequence number could potentially propagate across a large part of a well-connected network before any other nodes attempt to download, decreasing the amount of time that different parts of the network are using different VLs. * Send all of our current valid VLs to new peers on connection. This is probably the "noisiest" part of this change, but will give poorly connected or poorly networked nodes the best chance of syncing quickly. Nodes which have no http(s) access configured or available can get a VL with no extra effort. * Requests on the peer port to the /vl/<pubkey> endpoint will return that VL in the same JSON format as is used to download now, IF the node trusts and has a valid instance of that VL. * Upgrade protocol version to 2.1. VLs will only be sent to 2.1 and higher nodes. * Resolves XRPLF#2953
3249480
to
89c865d
Compare
haven't already sent or received it. It also saves it to the
database_dir as a Json text file named "cache." plus the public key of
the list signer. Any files that exist for public keys provided in
[validator_list_keys] will be loaded and processed if any download
from [validator_list_sites] fails or no [validator_list_sites] are
configured.
it had downloaded it on it's own, broadcasting to other peers as
described above.
node downloading a VL with an updated sequence number could
potentially propagate across a large part of a well-connected network
before any other nodes attempt to download, decreasing the amount of
time that different parts of the network are using different VLs.
This is probably the "noisiest" part of this change, but will give
poorly connected or poorly networked nodes the best chance of syncing
quickly. Nodes which have no http(s) access configured or available
can get a VL with no extra effort.
that VL in the same JSON format as is used to download now, IF the
node trusts and has a valid instance of that VL.