Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store the serialized block size in the database #5911

Closed
Tracked by #7366
teor2345 opened this issue Jan 4, 2023 · 2 comments
Closed
Tracked by #7366

Store the serialized block size in the database #5911

teor2345 opened this issue Jan 4, 2023 · 2 comments
Labels
A-rpc Area: Remote Procedure Call interfaces A-state Area: State / database changes C-enhancement Category: This is an improvement I-slow Problems with performance or responsiveness

Comments

@teor2345
Copy link
Contributor

teor2345 commented Jan 4, 2023

Motivation

RPCs like getblock return the serialized block size, but we haven't found any specific clients that need it yet.

If we find clients that need it, we want to get this size efficiently, without reading the whole block. This substantially improves the performance of clients like lightwalletd and zebra-checkpoints, which use other fields in this RPC.

Complex Code or Requirements

We could add a block size to the hash_by_height index. It only needs to be 4 bytes:
https://github.com/ZcashFoundation/zebra/blob/main/book/src/dev/rfcs/0005-state-updates.md#rocksdb-data-structures

This ticket updates the database format on disk. If we increment the state version, we'll run some rarely-used CI jobs, and make all our users rebuild their entire databases from scratch.

But we could implement the size lookup in a backwards-compatible way instead. If we check the length of the hash_by_height value, and return None if the block size is not present in the database, then old databases will still work. We could remove this compatibility code the next time we increment the state version.

Testing

We'll need tests for reading block sizes from the database.
Writing block sizes is covered by our existing sync and cached state tests.

If we use the backwards-compatible design, we'll also need specific tests for legacy state data (without the size) and new state data (with the size).

Our existing cached state tests will check both these modes. But coverage won't be guaranteed, because the blocks that have sizes will change depending on how other tests and PRs use the cached state.

Related Work

This is a performance fix for PR #5894, we're using a workaround for now. The workaround has acceptable performance for zebra-checkpoints.

@teor2345 teor2345 added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage P-Low ❄️ I-slow Problems with performance or responsiveness A-rpc Area: Remote Procedure Call interfaces labels Jan 4, 2023
@mpguerra
Copy link
Contributor

@teor2345
Copy link
Contributor Author

@mpguerra I think this might be optional, since we solved #5894 another way. And the performance is acceptable.

@teor2345 teor2345 added the A-state Area: State / database changes label Jan 30, 2023
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Mar 16, 2023
@mpguerra mpguerra closed this as not planned Won't fix, can't repro, duplicate, stale Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rpc Area: Remote Procedure Call interfaces A-state Area: State / database changes C-enhancement Category: This is an improvement I-slow Problems with performance or responsiveness
Projects
Status: Done
Development

No branches or pull requests

2 participants