New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Ban scoring #1543
[RFC] Ban scoring #1543
Conversation
5330ea1
to
49aac0b
Compare
I've been working with the rule that a peer gets banned if its trust score is ≤ -1. It makes the math slightly cleaner, but is arguably less human-readable. The difference is purely cosmetic since we can scale the punishment values to match. One way of thinking about how much to punish peers is to decide the maximum rate they can do whatever we're talking about without being banned. From the code I'm working on: (** Trust increment that sets a maximum rate of doing a bad thing (presuming the
peer does no good things) in seconds/action. *)
(* The amount of trust that decays in the specified time period, when we're at
the ban threshold (-1) *)
let max_rate secs = ((Record.decay_rate ** secs) *. -1.) +. 1. Where |
rfcs/0011-ban-scoring.md
Outdated
Let's classify those places where punishment has been flagged, and annotate | ||
them with suggested constructors: | ||
|
||
- in `bootstrap_controller.ml`, for bad proofs (SEV), and a validation error when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be best if we separated the set of offenses in two.
-
Absent hardware failures, an honest peer never does this. Should be an insta-ban, since it's a clear signal the peer is faulty or malicious.
-
Honest peers may do this if they are out of sync with us. Set punishment level based on the cost to us of receiving/processing the bad message. Not sure how that converts into a number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there's still room for trivial infractions, the kind where Bitcoin increments the ban score by one, like a duplicate version message. We could see that kind of thing if there are multiple Coda implementations.
That seems to be a consideration for DoS behavior as opposed to giving-bad-information behavior, which may be Bitcoin considers those separately. |
locations where punishment is warranted. | ||
|
||
In the current code, there are often calls to the logger where | ||
punishment is mentioned in a `TODO`. There could be an API that calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the trust scores code I have:
val record : t -> peer -> action -> unit
(** Record an action a peer took. This may result in a ban event being
emitted *)
Where action
is a type we functor over. The intent is that calling that function emits a log message, updates the trust score, and potentially writes a ban event to a pipe. The module passed to the functor needs to provide a function action -> string
so we can write useful log messages. It's an open question whether action
should be a variant that enumerates everything that affects trust scores or just something like float * string
. The former keeps all the trust increments together and makes changing them as a set easier, but is likely to lead to a really big variant. I'm kinda leaning towards the latter now, but 🤷♂️.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of a variant so we can keep infraction types next to each other and indexed in one spot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Bitcoin sources, where misbehavior is flagged, sometimes they provide a message, sometimes not. We should always provide a message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of record
can use the variant of bad behavior severities.
@enolan could put up a PR with the trust stuff you've been working on in an |
I can put that up in the AM. |
rfcs/0011-ban-scoring.md
Outdated
- in `ledger_catchup.ml`, when a root hash can't be found (SEV), or a peer returns an empty list | ||
of transitions (instead of `None`) (TRV) | ||
- in `linked_tree.ml`, for peers requesting nonexistent ancestor paths (MOD) | ||
- in `parallel_scan.ml`, in `update_new_job` for unneeded merges (?) (MOD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be SEV
though
So the higher the rate, the less the punishment? How does this rate idea combine with the potential harm of the bad action? I'm wondering if this is more complexity than needed. I was thinking that the potential harm alone would determine the severity of punishment. |
Addressed @bkase comments. |
- in `catchup_scheduler.ml` and `processor.ml`, when a breadcrumb can't be built from a | ||
transition (SEV) (same failure as in `bootstrap_controller.ml`, above) | ||
- in `ledger_catchup.ml`, a transition could not be validated (SEV) | ||
- in `transaction_pool.ml`, a payment check fails (SEV) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also missing from here is the bit that Echo is adding about sending a transaction that doesn't make sense on the current staged ledger.
This is the bit that justifies the trust scoring I think. I think Echo was mentioning this somewhere, but if you receive transactions at high-velocity from some node and 99% of them work on the staged ledger, but some of them don't due to inconsistent state between the two nodes, we'd want that to be okay.
Whereas if you get one transaction each day from a node, maybe on the second or third wrong one, you'd want to ban it.
This is something unique to Coda w.r.t. Bitcoin because we expect to support high-transaction throughput.
However, on the one hand, you could argue that we should be banning based on how much pain or wasted work was inflicted by the accused. In that case, you'd want to ban if you ever receive some constant amount of failures from a node, despite seeing correct ones and no matter in what rate you're receiving them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added section about trust scoring
rfcs/0011-ban-scoring.md
Outdated
@@ -144,7 +144,8 @@ history leading to a ban score. | |||
|
|||
## Prior art | |||
|
|||
There is existing code in Coda to maintain a set of peers banned by IP address. | |||
There is existing code in Coda to maintain a set of peers banned by IP address in | |||
`banlist_lib`. It will be superseded this RFC and RFC 0010 are implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing word: "when"
Other than the typo, I'm good. |
This is a proposed RFC for ban scoring.