-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass validation_callback through catchup process #10895
Pass validation_callback through catchup process #10895
Conversation
a5c6662
to
20c6ac5
Compare
20c6ac5
to
9715244
Compare
9715244
to
fb24096
Compare
d44782d
to
c47794d
Compare
231198f
to
91470c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think more work needs to be done here with how we handle errors when processing rose trees of blocks. When we have a rose tree of blocks, we don't want to reject an entire subtree just because a single block was invalid. This ends up being a bit tricky in practice. I think, as a general refactor, we should have any functions yielding errors while operating on trees of blocks return a tree of errors, so that we can map over both the input tree of blocks and output tree of errors together and make proper decisions about how to fill the validation callbacks in such scenarios. Reading the code, even besides your changes, I think there are additional bugs hidden here in how we handle errors, but those can be addressed in additional PRs (example given: if a single block in a rose tree is invalid, I believe we will drop all the valid blocks in the tree as well and not actually add them to our frontier).
Option.value_map valid_cb ~default:() ~f:(fun data -> | ||
match Hashtbl.add t.validation_callbacks ~key:hash ~data with | ||
| `Ok -> | ||
don't_wait_for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: can be more cleanly written as upon (Deferred.ignore_m @@ Mina_net2.Validation_callback.await data) (fun () -> Hashtbl.remove t.validation_callbacks hash)
.
@@ -45,45 +47,40 @@ type t = | |||
Cached.t | |||
list | |||
State_hash.Table.t | |||
(* Validation callbacks for state hashes that are being processed *) | |||
; validation_callbacks : Mina_net2.Validation_callback.t State_hash.Table.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to just include this in the collected_transitions
hash table? Then we don't have to maintain the state of 2 different hash tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code would be slightly trickier, now clean-up is a dumb thing: upon resolution of callback, an entry is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's possible for sure
List.iter cached_transitions | ||
~f:(Fn.compose ignore Cached.invalidate_with_failure) ) ; | ||
Hashtbl.iter collected_transitions | ||
~f:(List.iter ~f:(Fn.compose ignore Cached.invalidate_with_failure)) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also fill the validation callbacks here with Ignore
.
@@ -107,7 +104,8 @@ let create ~logger ~precomputed_values ~verifier ~trust_system ~frontier | |||
$error" | |||
~metadata:[ ("error", Error_json.error_to_yojson err) ] ; | |||
List.iter transition_branches ~f:(fun subtree -> | |||
Rose_tree.iter subtree ~f:(fun cached_transition -> | |||
Rose_tree.iter subtree ~f:(fun (cached_transition, _vc) -> | |||
(* TODO reject callback? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to Reject
blocks here depending on the reason they failed when building the tree of breadcrumbs.
( List.map acc ~f:Cached.invalidate_with_failure | ||
: Mina_block.initial_valid_block Envelope.Incoming.t list ) ; | ||
List.iter acc ~f:(fun (node, _vc) -> | ||
(* TODO reject callback? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it depends on the type of error triggered in verify_transition
. We probably need to change the error type of that function into something more meaningful in order to handle this.
@@ -293,11 +293,13 @@ let remove_node' t (node : Node.t) = | |||
() | |||
| To_initial_validate _ -> | |||
() | |||
| To_verify c -> | |||
| To_verify (c, _vc) -> | |||
(* TODO should we reject the validation callback? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fill the validation callback with Ignore
here.
(let metadata = | ||
[ ("state_hash", node.state_hash |> State_hash.to_yojson) ] | ||
in | ||
Option.value_map valid_cb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
(let metadata = | ||
[ ("state_hash", node.state_hash |> State_hash.to_yojson) ] | ||
in | ||
Option.value_map valid_cb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
(let metadata = | ||
[ ("state_hash", node.state_hash |> State_hash.to_yojson) ] | ||
in | ||
Option.value_map valid_cb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
) | ||
|> ignore ) ) | ||
Rose_tree.iter subtree ~f:(fun (node, _vc) -> | ||
(* TODO reject callback? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on the error.
…lidation-timeout-for-catchup
@@ -863,6 +874,7 @@ let setup_state_machine_runner ~t ~verifier ~downloader ~logger | |||
record trust_system logger peer | |||
Actions.(Sent_invalid_proof, None)) | |||
|> don't_wait_for ) ; | |||
(* TODO should we reject the validation callback? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nholland94 what do you think about this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should reject here.
let%bind parent = | ||
step | ||
(let parent = Hashtbl.find_exn t.nodes node.parent in | ||
match%map.Async.Deferred Ivar.read parent.result with | ||
| Ok `Added_to_frontier -> | ||
Ok parent.state_hash | ||
| Error _ -> | ||
(* TODO should we reject the validation callback? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nholland94 what do you think about this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, I think we should reject, but it may depend on the type of error we receive. Right now, this error field doesn't indicate the reason the node result was terminated, and some code paths terminate the node with an error result even if the error isn't because the block was invalid. So for now, in interest of leaving the behavior the same as it was, let's just ignore these validation callbacks.
let make_timeout duration = | ||
Block_time.Timeout.create t.time_controller duration ~f:(fun _ -> | ||
(* TODO inject valid_cb ? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to remove this one
df30a50
to
11da8e7
Compare
11da8e7
to
ad7a12c
Compare
…lidation-timeout-for-catchup
ad7a12c
to
e7b03f3
Compare
Pass validation callback along with the block to make sure it's triggered as a result of block validation process.
Explain how you tested your changes:
Checklist: