Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch transactions on persistent frontier writing #13557

Merged
merged 5 commits into from
Sep 21, 2023

Conversation

georgeee
Copy link
Member

@georgeee georgeee commented Jul 10, 2023

Problem: during testing on a private cluster, persistent frontier writes caused a series of consequent long async cycles of length 8s and 9 x 13s (total of > 100s). Long async cycles of more than 10s is a bad sign on its own, but even more so when combined in consequent groups.

Cycles are caused by a job in persistent frontier that dumps 10 blocks into RocksDB.

Explain your changes:

  • Submit changes to RocksDB in a single batch for all 10 blocks

Explain how you tested your changes:

  • Tested on a private cluster

Checklist:

  • Dependency versions are unchanged
    • Notify Velocity team if dependencies must change in CI
  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? List them
  • Closes #0000

@georgeee georgeee requested a review from a team as a code owner July 10, 2023 18:06
get t.db ~key:(Arcs parent_hash) ~error:(`Not_found (`Arcs parent_hash))
match State_hash.Table.find arcs_cache parent_hash with
| None ->
get t.db ~key:(Arcs parent_hash) ~error:(`Not_found (`Arcs parent_hash))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batching these gets will further improve the performance of this by quite a lot. This could be accomplished by splitting the add function into 2 steps, or by implementing a simple monad here (or applicative functor, but ocaml has better tooling for monads).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may try it, yep

let input =
List.filter_map input ~f:(function
| Diff.Lite.E.E (Best_tip_changed _) as diff ->
best_tip_cnt := !best_tip_cnt - 1 ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: you can write decr best_tip_cnt here (and below).

let root_cnt = ref root_cnt in
let garbage_prev = ref [] in
let input =
List.filter_map input ~f:(function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this code a bit tricky to reason about, even though I think it's correct. I'd probably prefer this to be written somewhat as follows thought:

  • partion diffs by type (best tip changed, root transitioned, others)
  • perform the diff optimizations by folding over the all elements of best tip changed except for the last (do the same with root transitioned)

I think the code will be easier to reason about if implemented this way, instead of having to count the diff types and using mutable logic in the loop to reason about which element is the last in the sequence of those types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nholland94 I implemented it with counting because I didn't want this code to rely on that last two items should be best tip change and root transitioned. And if I partition by type, how could I reconstruct the order later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you partition by type, each partition will be in the same order the elements occurred in the initial list. The counting allows you to find the last best tip change and last root transition within the full list of diffs. If they list of diffs are partitioned, then the last element of the best tip change list and the last element of the root transition list will be the same diffs you are selecting via this counting mechanism (if I am not mistaken).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, could you check my last commit? I rewrote it without counters, not sure if in the exact way you had in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. Let me take a stab at what I was describing and see if I can come up with something a bit clearer. Though if it ends up being more of a hassle than I think it will be, we can just defer refactoring this until later.

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch 2 times, most recently from c896953 to 794aa92 Compare July 12, 2023 15:29
@deepthiskumar
Copy link
Member

!ci-build-me

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch from cae8f84 to 0f5ce19 Compare July 26, 2023 13:50
@georgeee
Copy link
Member Author

!ci-build-me

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch 2 times, most recently from 0f5ce19 to eaf52ed Compare July 26, 2023 18:17
@georgeee
Copy link
Member Author

!ci-build-me

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch from eaf52ed to 1240e03 Compare August 25, 2023 17:29
@georgeee georgeee self-assigned this Aug 28, 2023
@nholland94
Copy link
Member

!ci-build-me

Copy link
Member Author

@georgeee georgeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me this way. I guess I was stuck in "don't change the order" mentality in that I didn't want last best_tip_diff to occur after last root_transition if it was vice versa originally. But given we update independent parts of DB, it should be totally fine.

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch from 71d875b to 2beb1a0 Compare September 13, 2023 20:35
@georgeee
Copy link
Member Author

!ci-build-me

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch from 2beb1a0 to ecb005f Compare September 13, 2023 22:01
@georgeee
Copy link
Member Author

!ci-build-me

@nholland94
Copy link
Member

!approved-for-mainnet

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch from ecb005f to b90f98e Compare September 14, 2023 09:36
@georgeee
Copy link
Member Author

!ci-build-me

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch from b90f98e to c2b827d Compare September 15, 2023 10:56
@georgeee
Copy link
Member Author

!ci-build-me

@georgeee
Copy link
Member Author

Evidently, there is a bug in the PR caught by the medium bootstrap integration test
Need to perform some investigation

let extra_garbage =
List.drop_last root_transition_diffs
|> Option.value ~default:[]
|> List.bind ~f:(fun { new_root; garbage = Lite garbage; _ } ->
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing just_emitted log for squashed root transitions

~metadata:[ ("parent", `String (State_hash.to_base58_check h)) ] ;
Deferred.unit
| Error (`Not_found `Old_root_transition) ->
failwith
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error is copypaste and not specific

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch from 66023c7 to 4e04a54 Compare September 19, 2023 21:19
@georgeee
Copy link
Member Author

!ci-build-me

@georgeee georgeee force-pushed the georgeee/batch-persistent-frontier-writing branch from 86225f6 to 354ab60 Compare September 20, 2023 21:24
@georgeee
Copy link
Member Author

!ci-build-me

@georgeee
Copy link
Member Author

!ci-nightly-me

2 similar comments
@georgeee
Copy link
Member Author

!ci-nightly-me

@deepthiskumar
Copy link
Member

!ci-nightly-me

@georgeee
Copy link
Member Author

@deepthiskumar
Copy link
Member

!approved-for-mainnet

@deepthiskumar deepthiskumar merged commit aaa9d6f into berkeley Sep 21, 2023
99 of 101 checks passed
@deepthiskumar deepthiskumar deleted the georgeee/batch-persistent-frontier-writing branch September 21, 2023 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants