Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diagnostic: Log column family and database size on startup and shutdown #7416

Open
Tracked by #6642
teor2345 opened this issue Aug 29, 2023 · 3 comments
Open
Tracked by #6642
Labels
A-diagnostics Area: Diagnosing issues or monitoring performance A-state Area: State / database changes C-enhancement Category: This is an improvement E-help-wanted Call for participation: Help is requested to fix this issue. good first issue I-usability Zebra is hard to understand or use S-needs-triage Status: A bug report needs triage

Comments

@teor2345
Copy link
Contributor

Motivation

Some Zebra users are concerned about the size of the on-disk database, particularly miners (#5718). Others are concerned about memory usage. Zebra developers also need to monitor database and column family sizes as part of state upgrades.

It would be useful to print the total database size, and the size of each column family, on disk and in memory.

We could print it at startup and shutdown.

Specifications

There are RocksDB APIs for each column family:
https://docs.rs/rocksdb/latest/rocksdb/struct.DBCommon.html#method.property_int_value_cf

We can get live and total disk size using these properties:
https://docs.rs/rocksdb/latest/rocksdb/properties/constant.ESTIMATE_LIVE_DATA_SIZE.html
https://docs.rs/rocksdb/latest/rocksdb/properties/constant.TOTAL_SST_FILES_SIZE.html

And memory size (why not?) using this property:
https://docs.rs/rocksdb/latest/rocksdb/properties/constant.SIZE_ALL_MEM_TABLES.html

Complex Code or Requirements

To get the total size, we need to iterate through each column family, including the default column family, then add the values.

Testing

Manually compare the total with the size on disk using du, and the size in memory using top.

RocksDB uses extra files for old data and deleted data, so the RocksDB disk sizes should be smaller. Live disk should also be smaller than total disk.

Zebra uses memory outside RocksDB, so the RocksDB memory usage should be smaller.

Related Work

We might also want to print the memory or disk usage regularly, but that's out of scope for this ticket. Memory usage can vary a lot depending on what operations Zebra is doing.

@teor2345 teor2345 added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage P-Low ❄️ I-usability Zebra is hard to understand or use A-diagnostics Area: Diagnosing issues or monitoring performance A-state Area: State / database changes labels Aug 29, 2023
@teor2345 teor2345 self-assigned this Aug 29, 2023
@teor2345 teor2345 removed their assignment Sep 4, 2023
@teor2345 teor2345 added good first issue E-help-wanted Call for participation: Help is requested to fix this issue. labels Jan 1, 2024
@elijahhampton
Copy link
Contributor

Is this ticket available for work? @teor2345 @mpguerra

@mpguerra
Copy link
Contributor

Is this ticket available for work? @teor2345 @mpguerra

Yes, it is

@elijahhampton
Copy link
Contributor

Is this ticket available for work? @teor2345 @mpguerra

Yes, it is

Awesome. I just created a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Diagnosing issues or monitoring performance A-state Area: State / database changes C-enhancement Category: This is an improvement E-help-wanted Call for participation: Help is requested to fix this issue. good first issue I-usability Zebra is hard to understand or use S-needs-triage Status: A bug report needs triage
Projects
Status: New
Development

No branches or pull requests

3 participants