Skip to content

Benchmark C-Chain with height-indexed db vs leveldb-only blocks #4523

@DracoLi

Description

@DracoLi

Context

As part of the Height-Indexed Database C-Chain Integration work, we now have an EVM database wrapper that sends block headers/bodies/receipts to a height-indexed db while keeping non-block data on the existing leveldb-backed KV store.

This issue is about measuring what changes when we turn that on: block insertion time, disk usage, and node resource usage, compared to when all block data is stored in leveldb.

Questions we want to answer

  • Block insertion time and throughput
    • Does enabling the height-indexed db change C-Chain block insertion time (and therefore block verify + accept time)?
    • Does it change the variance of block insertion time (do block insertions become more or less “spiky”)?
    • Does enabling the height-indexed db let us process more blocks overall (for example, does mgas/s change when running the reexecution tests)?
  • On-disk storage usage
    • What is the difference in total on-disk storage between:
      • State sync off, storing all C-Chain blocks (+72M blocks).
      • State sync on, only storing blocks after the state sync height.
    • For each of the above, how do leveldb-only and height-indexed-db-enabled setups compare, and how quickly do they grow over time?
  • Node resource usage and compactions
    • How does moving block data out of leveldb change compaction behavior (frequency, duration, bytes written, and write stalls)?
    • Do we see a meaningful change in CPU, memory, disk IO, or db-level latency that we can reasonably attribute to those compaction changes?
  • Migration cost
    • For a node with state sync off (storing all C-Chain blocks), how long does it take to migrate into the height-indexed db, and what is the resource profile?
    • For a node with state sync on (only storing blocks after the state sync height), how long does migration take, and how does its resource profile compare?
    • In both cases, how disruptive is migration to the node (for example, does it noticeably affect responsiveness)?

Metrics to collect

  • Block insertion time and throughput
    • For C-Chain nodes with the height-indexed db enabled vs disabled:
      • Average block insertion time over the last X days.
      • Mean block insertion time and standard deviation over the last X days (to capture variance).
    • Reexecution tests:
      • mgas/s reported by the reexecution tests with and without the height-indexed db.
  • On-disk storage usage
    • For archive nodes (state sync off, storing all C-Chain blocks):
      • Total db size for leveldb-only vs leveldb + height-indexed db.
    • For nodes with state sync on (only storing blocks after the state sync height):
      • Total db size for leveldb-only vs leveldb + height-indexed db.
    • For both node types:
      • Approximate growth rate over a defined period (for example, size change after X additional blocks or Y days).
  • Node resource usage and compactions
    • Leveldb compaction + stall metrics:
      • writes_delayed, writes_delayed_duration, and write_delayed to see how often and how long writes are stalled by compaction.
      • io_write and io_read to track total compaction IO.
      • Per-level stats: duration, reads, writes along with mem_comps, level_0_comps, non_level_0_comps, seek_comps to understand where compaction work is happening.
  • CPU and memory:
    • CPU usage over time (average and peaks), memory usage (RSS and Go heap) for both configurations.
  • Disk IO and db latency:
    • Read/write throughput and IO latency for both configurations.
    • If available, average and tail latency for db Get/Put-style operations, to see whether heavy compaction periods line up with slower db operations.
  • Migration cost
    • Total migration time for:
      • A node with state sync off.
      • A node with state sync on.
    • Migration throughput:
      • Blocks migrated per second and/or bytes migrated per second.
    • Resource usage during migration:
      • CPU, memory, and disk IO while migration runs.
      • Any noticeable impact on node responsiveness (for example, RPC latency) during migration if run on a live node.

Metadata

Metadata

Assignees

Labels

monitoringThis primarily focuses on logs, metrics, and/or tracing

Type

Projects

Status

In Progress 🏗️

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions