Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch insertion events in archive database #14779

Merged
merged 20 commits into from
Jan 10, 2024

Conversation

ghost-not-in-the-shell
Copy link
Contributor

@ghost-not-in-the-shell ghost-not-in-the-shell commented Jan 3, 2024

Explain your changes:
This PR does batch insertion for zkapp_events in archive db.

Since Account_update.Body.Events'.t is defined as field array list, which is ismorphic to a list of list of fields.

We could batch the insertion of field and field_array to optimize the speed of archiving max-cost zkapps.

  1. we flatten the list of list of fields to get all the field elements
  2. insert all the field elements in one query
  3. construct a map M from field_id to field by querying against the zkapp_field table
  4. use M and the list of list of fields to compute the list of list of field_ids
  5. insert all list of list of field_ids in one query
  6. construct a map M' from field_array_id to field_id array by querying against the zkapp_field_array table
  7. use M' and the list of list of field_ids to compute the list of field_array_ids
  8. insert the list of field_arrays

To patch the existing database use add_unique_constraints.sql

Explain how you tested your changes:
Tested locally and also on testworld-2-0

Checklist:

  • Dependency versions are unchanged
    • Notify Velocity team if dependencies must change in CI
  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? List them
  • Closes #0000

@ghost-not-in-the-shell ghost-not-in-the-shell requested a review from a team as a code owner January 3, 2024 16:55
@ghost-not-in-the-shell ghost-not-in-the-shell requested a review from a team as a code owner January 4, 2024 22:34
@ghost-not-in-the-shell ghost-not-in-the-shell changed the title add idx for element_ids various changes to improve the performance of archive process Jan 4, 2024
@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-build-me

in
let increment_nonce = body.increment_nonce in
let%bind events_id =
Zkapp_events.add_if_doesn't_exist (module Conn) body.events
Metrics.time ~label:"Zkapp_events.add"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all of these Metrics.time?

@@ -287,15 +287,19 @@ let select_insert_into_cols ~(select : string * 'select Caqti_type.t)
@@ select_cols ~select:(fst select) ~table_name ?tannot ~cols:(fst cols) ()
)
value
>>= function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous code with >>= function looked cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I changed it to add some more logs. Let me revert those changes.

@@ insert_into_cols ~returning:(fst select) ~table_name ?tannot
~cols:(fst cols) () )
value
let%map res =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant let%map introduced

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this change was for logging. Let me undo that.

()
>>| String.Map.of_alist_exn
in
let field_id_list_list =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let field_id_list_list =

field_id_list_list is not needed at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a let binding here to make the code more readable. Just to give a name to the result of that computation.

WHERE element_ids in (%s) |sql}
@@ sep_by_comma field_array_list
in
let module Field_array_map = Map.Make (struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this module to the module-level context (so that it's not defined in the middle of function definition)?

let%bind field_id_list_list =
if not @@ List.is_empty fields then
let field_insert =
sprintf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can generalize the pattern and extract it as a separate function to reduce code duplication?

I'm taking about insert on conflict do nothing paired with select of some_field, id with conversion of it to map preserving the initial order.

Copy link
Contributor Author

@ghost-not-in-the-shell ghost-not-in-the-shell Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to do that. In fact, Mina_caqti.select_insert_into_cols did a similar abstraction over 2 queries. But for this one, it's a bit harder, because I was hand-written some of the query and didn't get to use caqti's typ stuff. Let me leave a comment around this part of code saying we could abstract this pattern out once I figure out how to use typ to convert between values and sql representations. I think we want to merge this soon so let create a ticket for do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I got most part of the refactor done. I left out the conversion from value to string.

@ghost-not-in-the-shell ghost-not-in-the-shell changed the title various changes to improve the performance of archive process batch insertion events in archive database Jan 5, 2024
@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-build-me

@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-build-me

Copy link
Member

@deepthiskumar deepthiskumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for rampup8 release before landing this

@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-build-me

@deepthiskumar deepthiskumar dismissed their stale review January 9, 2024 21:48

Include in rampup8

@deepthiskumar
Copy link
Member

!ci-nightly-me

@deepthiskumar
Copy link
Member

!ci-build-me

@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-build-me

@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-nightly-me

@ghost-not-in-the-shell ghost-not-in-the-shell changed the base branch from rampup to berkeley January 10, 2024 15:41
@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-nightly-me

@ghost-not-in-the-shell ghost-not-in-the-shell changed the base branch from berkeley to rampup January 10, 2024 15:42
@ghost-not-in-the-shell ghost-not-in-the-shell changed the base branch from rampup to berkeley January 10, 2024 15:49
@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-nightly-me

@ghost-not-in-the-shell ghost-not-in-the-shell changed the base branch from berkeley to rampup January 10, 2024 15:53
@ghost-not-in-the-shell
Copy link
Contributor Author

!ci-build-me

@deepthiskumar
Copy link
Member

!approved-for-mainnet

@ghost-not-in-the-shell ghost-not-in-the-shell merged commit ceaae53 into rampup Jan 10, 2024
36 checks passed
@ghost-not-in-the-shell ghost-not-in-the-shell deleted the fix/archive-db-slowness branch January 10, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants