-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch transactions on persistent frontier writing #13557
Batch transactions on persistent frontier writing #13557
Conversation
get t.db ~key:(Arcs parent_hash) ~error:(`Not_found (`Arcs parent_hash)) | ||
match State_hash.Table.find arcs_cache parent_hash with | ||
| None -> | ||
get t.db ~key:(Arcs parent_hash) ~error:(`Not_found (`Arcs parent_hash)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batching these gets will further improve the performance of this by quite a lot. This could be accomplished by splitting the add
function into 2 steps, or by implementing a simple monad here (or applicative functor, but ocaml has better tooling for monads).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may try it, yep
let input = | ||
List.filter_map input ~f:(function | ||
| Diff.Lite.E.E (Best_tip_changed _) as diff -> | ||
best_tip_cnt := !best_tip_cnt - 1 ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: you can write decr best_tip_cnt
here (and below).
let root_cnt = ref root_cnt in | ||
let garbage_prev = ref [] in | ||
let input = | ||
List.filter_map input ~f:(function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this code a bit tricky to reason about, even though I think it's correct. I'd probably prefer this to be written somewhat as follows thought:
- partion diffs by type (best tip changed, root transitioned, others)
- perform the diff optimizations by folding over the all elements of best tip changed except for the last (do the same with root transitioned)
I think the code will be easier to reason about if implemented this way, instead of having to count the diff types and using mutable logic in the loop to reason about which element is the last in the sequence of those types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nholland94 I implemented it with counting because I didn't want this code to rely on that last two items should be best tip change and root transitioned. And if I partition by type, how could I reconstruct the order later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you partition by type, each partition will be in the same order the elements occurred in the initial list. The counting allows you to find the last best tip change and last root transition within the full list of diffs. If they list of diffs are partitioned, then the last element of the best tip change list and the last element of the root transition list will be the same diffs you are selecting via this counting mechanism (if I am not mistaken).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, could you check my last commit? I rewrote it without counters, not sure if in the exact way you had in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. Let me take a stab at what I was describing and see if I can come up with something a bit clearer. Though if it ends up being more of a hassle than I think it will be, we can just defer refactoring this until later.
c896953
to
794aa92
Compare
!ci-build-me |
cae8f84
to
0f5ce19
Compare
!ci-build-me |
0f5ce19
to
eaf52ed
Compare
!ci-build-me |
eaf52ed
to
1240e03
Compare
!ci-build-me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me this way. I guess I was stuck in "don't change the order" mentality in that I didn't want last best_tip_diff
to occur after last root_transition
if it was vice versa originally. But given we update independent parts of DB, it should be totally fine.
71d875b
to
2beb1a0
Compare
!ci-build-me |
2beb1a0
to
ecb005f
Compare
!ci-build-me |
!approved-for-mainnet |
ecb005f
to
b90f98e
Compare
!ci-build-me |
b90f98e
to
c2b827d
Compare
!ci-build-me |
Evidently, there is a bug in the PR caught by the medium bootstrap integration test |
let extra_garbage = | ||
List.drop_last root_transition_diffs | ||
|> Option.value ~default:[] | ||
|> List.bind ~f:(fun { new_root; garbage = Lite garbage; _ } -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing just_emitted
log for squashed root transitions
~metadata:[ ("parent", `String (State_hash.to_base58_check h)) ] ; | ||
Deferred.unit | ||
| Error (`Not_found `Old_root_transition) -> | ||
failwith |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error is copypaste and not specific
66023c7
to
4e04a54
Compare
!ci-build-me |
1. Simplify code of batch writes 2. Use batch get
86225f6
to
354ab60
Compare
!ci-build-me |
!ci-nightly-me |
2 similar comments
!ci-nightly-me |
!ci-nightly-me |
Confirmed that medium bootstrap test is passing now: https://buildkite.com/o-1-labs-2/mina-end-to-end-nightlies/builds/508#018ab6fa-6a16-4813-926b-f4d068423d35 |
!approved-for-mainnet |
Problem: during testing on a private cluster, persistent frontier writes caused a series of consequent long async cycles of length 8s and 9 x 13s (total of > 100s). Long async cycles of more than 10s is a bad sign on its own, but even more so when combined in consequent groups.
Cycles are caused by a job in persistent frontier that dumps 10 blocks into RocksDB.
Explain your changes:
Explain how you tested your changes:
Checklist: