Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass validation_callback through catchup process #10895

Merged
merged 7 commits into from
Jun 7, 2022

Conversation

georgeee
Copy link
Member

@georgeee georgeee commented May 9, 2022

Pass validation callback along with the block to make sure it's triggered as a result of block validation process.

Explain how you tested your changes:

  • Existing test suite is sufficient

Checklist:

  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? Block propagation fail on blocks which enter the catchup subsystem #10867

@georgeee georgeee added the ci-build-me Add this label to trigger a circle+buildkite build for this branch label May 9, 2022
@georgeee georgeee requested a review from a team as a code owner May 9, 2022 12:11
@georgeee georgeee force-pushed the georgeee/fix-validation-timeout-for-catchup branch from a5c6662 to 20c6ac5 Compare May 9, 2022 15:31
@georgeee georgeee requested a review from a team as a code owner May 9, 2022 15:32
@georgeee georgeee force-pushed the georgeee/fix-validation-timeout-for-catchup branch from 20c6ac5 to 9715244 Compare May 9, 2022 18:20
Base automatically changed from georgeee/mina-block-package to compatible May 10, 2022 20:42
@georgeee georgeee force-pushed the georgeee/fix-validation-timeout-for-catchup branch from 9715244 to fb24096 Compare May 11, 2022 16:26
@georgeee georgeee force-pushed the georgeee/fix-validation-timeout-for-catchup branch 5 times, most recently from d44782d to c47794d Compare May 18, 2022 12:42
@georgeee georgeee force-pushed the georgeee/fix-validation-timeout-for-catchup branch from 231198f to 91470c8 Compare May 19, 2022 11:58
@georgeee georgeee changed the title [WIP] Pass validation_callback through catchup process Pass validation_callback through catchup process May 19, 2022
Copy link
Member

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think more work needs to be done here with how we handle errors when processing rose trees of blocks. When we have a rose tree of blocks, we don't want to reject an entire subtree just because a single block was invalid. This ends up being a bit tricky in practice. I think, as a general refactor, we should have any functions yielding errors while operating on trees of blocks return a tree of errors, so that we can map over both the input tree of blocks and output tree of errors together and make proper decisions about how to fill the validation callbacks in such scenarios. Reading the code, even besides your changes, I think there are additional bugs hidden here in how we handle errors, but those can be addressed in additional PRs (example given: if a single block in a rose tree is invalid, I believe we will drop all the valid blocks in the tree as well and not actually add them to our frontier).

Option.value_map valid_cb ~default:() ~f:(fun data ->
match Hashtbl.add t.validation_callbacks ~key:hash ~data with
| `Ok ->
don't_wait_for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: can be more cleanly written as upon (Deferred.ignore_m @@ Mina_net2.Validation_callback.await data) (fun () -> Hashtbl.remove t.validation_callbacks hash).

@@ -45,45 +47,40 @@ type t =
Cached.t
list
State_hash.Table.t
(* Validation callbacks for state hashes that are being processed *)
; validation_callbacks : Mina_net2.Validation_callback.t State_hash.Table.t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to just include this in the collected_transitions hash table? Then we don't have to maintain the state of 2 different hash tables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code would be slightly trickier, now clean-up is a dumb thing: upon resolution of callback, an entry is removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's possible for sure

List.iter cached_transitions
~f:(Fn.compose ignore Cached.invalidate_with_failure) ) ;
Hashtbl.iter collected_transitions
~f:(List.iter ~f:(Fn.compose ignore Cached.invalidate_with_failure)) ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also fill the validation callbacks here with Ignore.

@@ -107,7 +104,8 @@ let create ~logger ~precomputed_values ~verifier ~trust_system ~frontier
$error"
~metadata:[ ("error", Error_json.error_to_yojson err) ] ;
List.iter transition_branches ~f:(fun subtree ->
Rose_tree.iter subtree ~f:(fun cached_transition ->
Rose_tree.iter subtree ~f:(fun (cached_transition, _vc) ->
(* TODO reject callback? *)
Copy link
Member

@nholland94 nholland94 May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to Reject blocks here depending on the reason they failed when building the tree of breadcrumbs.

( List.map acc ~f:Cached.invalidate_with_failure
: Mina_block.initial_valid_block Envelope.Incoming.t list ) ;
List.iter acc ~f:(fun (node, _vc) ->
(* TODO reject callback? *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it depends on the type of error triggered in verify_transition. We probably need to change the error type of that function into something more meaningful in order to handle this.

@@ -293,11 +293,13 @@ let remove_node' t (node : Node.t) =
()
| To_initial_validate _ ->
()
| To_verify c ->
| To_verify (c, _vc) ->
(* TODO should we reject the validation callback? *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fill the validation callback with Ignore here.

(let metadata =
[ ("state_hash", node.state_hash |> State_hash.to_yojson) ]
in
Option.value_map valid_cb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

(let metadata =
[ ("state_hash", node.state_hash |> State_hash.to_yojson) ]
in
Option.value_map valid_cb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

(let metadata =
[ ("state_hash", node.state_hash |> State_hash.to_yojson) ]
in
Option.value_map valid_cb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

)
|> ignore ) )
Rose_tree.iter subtree ~f:(fun (node, _vc) ->
(* TODO reject callback? *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the error.

@@ -863,6 +874,7 @@ let setup_state_machine_runner ~t ~verifier ~downloader ~logger
record trust_system logger peer
Actions.(Sent_invalid_proof, None))
|> don't_wait_for ) ;
(* TODO should we reject the validation callback? *)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nholland94 what do you think about this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should reject here.

let%bind parent =
step
(let parent = Hashtbl.find_exn t.nodes node.parent in
match%map.Async.Deferred Ivar.read parent.result with
| Ok `Added_to_frontier ->
Ok parent.state_hash
| Error _ ->
(* TODO should we reject the validation callback? *)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nholland94 what do you think about this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, I think we should reject, but it may depend on the type of error we receive. Right now, this error field doesn't indicate the reason the node result was terminated, and some code paths terminate the node with an error result even if the error isn't because the block was invalid. So for now, in interest of leaving the behavior the same as it was, let's just ignore these validation callbacks.

let make_timeout duration =
Block_time.Timeout.create t.time_controller duration ~f:(fun _ ->
(* TODO inject valid_cb ? *)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove this one

@georgeee georgeee force-pushed the georgeee/fix-validation-timeout-for-catchup branch from df30a50 to 11da8e7 Compare May 26, 2022 16:36
@georgeee georgeee force-pushed the georgeee/fix-validation-timeout-for-catchup branch from 11da8e7 to ad7a12c Compare May 27, 2022 15:27
@georgeee georgeee force-pushed the georgeee/fix-validation-timeout-for-catchup branch from ad7a12c to e7b03f3 Compare June 7, 2022 12:30
@georgeee georgeee merged commit 2b9d50f into compatible Jun 7, 2022
@georgeee georgeee deleted the georgeee/fix-validation-timeout-for-catchup branch June 7, 2022 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-build-me Add this label to trigger a circle+buildkite build for this branch code-freeze-1.3.2-eligible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants