Skip to content

feat(rocksdb): support ToplingDB provider#3024

Open
WaterWhisperer wants to merge 54 commits into
apache:masterfrom
WaterWhisperer:toplingdb-rebase
Open

feat(rocksdb): support ToplingDB provider#3024
WaterWhisperer wants to merge 54 commits into
apache:masterfrom
WaterWhisperer:toplingdb-rebase

Conversation

@WaterWhisperer
Copy link
Copy Markdown

@WaterWhisperer WaterWhisperer commented May 12, 2026

Purpose of the PR

Add ToplingDB support as an optional RocksDB-compatible storage path.

This PR rebases and prepares the existing ToplingDB adaptation work for review on the current master. ToplingDB is integrated as an optional provider for RocksDB-based storage, allowing users to enable RocksDB/ToplingDB configuration and the ToplingDB Web UI while preserving the standard RocksDB path.

Main Changes

  • Add hugegraph-rocksdb-provider module
  • Update HugeGraph Server RocksDB integration
  • Add ToplingDB preload support in server dist
  • Add PD / Store ToplingDB integration
  • Add Maven / CI / dependency metadata updates
  • Add ToplingDB docs and specs

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • Built packaged dist artifacts for Server / PD / Store
    • Verified packaged rocksdbjni-8.10.2-SNAPSHOT.jar contains ToplingDB resources including org/rocksdb/SidePluginRepo.class
    • Ran Server with ToplingDB enabled
    • Ran Server without rocksdb.option_path
    • Ran PD / Store with ToplingDB config

Does this PR potentially affect the following parts?

  • Dependencies (add/update license info & regenerate_known_dependencies.sh)
  • Modify configurations
  • The public API
  • Other affects (typed here)
    • Adds native library preload logic for ToplingDB-enabled RocksDB JNI runtime.
    • Adds optional ToplingDB Web UI support.
    • Adds RocksDB provider SPI and routes RocksDB opening through the provider loader.
  • Nope

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

syslucas added 30 commits May 11, 2026 00:05
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. dependencies Incompatible dependencies of package feature New feature pd PD module store Store module labels May 12, 2026
@WaterWhisperer WaterWhisperer marked this pull request as draft May 13, 2026 07:38
@WaterWhisperer WaterWhisperer marked this pull request as ready for review May 24, 2026 16:49
@imbajin imbajin requested a review from Copilot May 27, 2026 10:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an SPI-based RocksDB provider layer to support ToplingDB as an optional RocksDB-compatible backend, including config options for Topling YAML parameters and optional HTTP UI, plus distro/CI packaging changes to preload native libraries and web assets.

Changes:

  • Introduce hugegraph-rocksdb-provider module with provider discovery/selection and Standard/Topling providers.
  • Route Server/Store/PD RocksDB open/close through the provider loader and add rocksdb.option_path / rocksdb.open_http configs.
  • Add preload scripts/resources and CI workflow updates to support ToplingDB artifacts and runtime prerequisites.

Reviewed changes

Copilot reviewed 53 out of 61 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
pom.xml Adds provider module, sets rocksdbjni.version, and excludes extracted Topling HTML/CSS from checks.
install-dist/scripts/dependency/known-dependencies.txt Updates known dependency inventory for new/updated jars (incl. snapshot rocksdbjni).
install-dist/release-docs/licenses/LICENSE-rocksdbjni-8.10.2-SNAPSHOT.txt Adds license text for the snapshot rocksdbjni distribution.
install-dist/release-docs/LICENSE Updates LICENSE entries for new/updated dependencies and rocksdbjni source.
hugegraph-store/hg-store-rocksdb/src/main/java/org/apache/hugegraph/rocksdb/access/RocksDBSession.java Switches RocksDB open/close to RocksDBProviderLoader and passes option/http params.
hugegraph-store/hg-store-rocksdb/src/main/java/org/apache/hugegraph/rocksdb/access/RocksDBOptions.java Adds rocksdb.option_path and rocksdb.open_http options for store.
hugegraph-store/hg-store-rocksdb/pom.xml Replaces direct rocksdbjni dependency with hugegraph-rocksdb-provider.
hugegraph-store/hg-store-node/src/main/java/org/apache/hugegraph/store/node/metrics/RocksDBMetricsConst.java Removes metrics constants no longer available in RocksDB 8.x.
hugegraph-store/hg-store-dist/src/assembly/static/conf/rocksdb_store.yaml Adds sample Topling/RocksDB YAML config for store.
hugegraph-store/hg-store-dist/src/assembly/static/conf/application-pd.yml Documents new RocksDB option_path/open_http config fields.
hugegraph-store/hg-store-dist/src/assembly/static/bin/util.sh Adds helper to locate server dist dir for preload integration.
hugegraph-store/hg-store-dist/src/assembly/static/bin/start-hugegraph-store.sh Sources server preload script when available before starting store.
hugegraph-store/hg-store-core/pom.xml Updates jraft-core version.
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/unit/rocksdb/RocksDBSessionsTest.java Formatting-only update in test code.
hugegraph-server/hugegraph-rocksdb/src/main/java/org/apache/hugegraph/backend/store/rocksdb/RocksDBStore.java Formatting-only change in signature wrapping.
hugegraph-server/hugegraph-rocksdb/src/main/java/org/apache/hugegraph/backend/store/rocksdb/RocksDBStdSessions.java Opens RocksDB via provider loader and gates Topling HTTP to GRAPH_STORE.
hugegraph-server/hugegraph-rocksdb/src/main/java/org/apache/hugegraph/backend/store/rocksdb/RocksDBOptions.java Adds rocksdb.option_path and rocksdb.open_http options for server.
hugegraph-server/hugegraph-rocksdb/src/main/java/org/apache/hugegraph/backend/store/rocksdb/OpenedRocksDB.java Closes RocksDB via provider loader to enable provider-specific cleanup.
hugegraph-server/hugegraph-rocksdb/pom.xml Replaces direct rocksdbjni dependency with hugegraph-rocksdb-provider.
hugegraph-server/hugegraph-dist/src/assembly/travis/run-unit-test.sh Installs RocksDB/Topling runtime prereqs before running unit tests.
hugegraph-server/hugegraph-dist/src/assembly/travis/run-core-test.sh Installs RocksDB/Topling runtime prereqs before running core tests.
hugegraph-server/hugegraph-dist/src/assembly/travis/run-api-test.sh Installs RocksDB/Topling runtime prereqs before API tests.
hugegraph-server/hugegraph-dist/src/assembly/travis/install-rocksdb.sh New helper to preload Topling libs/resources in CI.
hugegraph-server/hugegraph-dist/src/assembly/travis/install-deps.sh New helper to install native deps (liburing/libaio/jemalloc) in CI.
hugegraph-server/hugegraph-dist/src/assembly/static/conf/graphs/rocksdb_server.yaml Adds sample Topling/RocksDB YAML config for server.
hugegraph-server/hugegraph-dist/src/assembly/static/conf/graphs/hugegraph.properties Documents new rocksdb.option_path / rocksdb.open_http properties.
hugegraph-server/hugegraph-dist/src/assembly/static/bin/start-hugegraph.sh Sources preload script before starting server.
hugegraph-server/hugegraph-dist/src/assembly/static/bin/preload-topling.sh New preload entrypoint to extract/preload native libs + resources.
hugegraph-server/hugegraph-dist/src/assembly/static/bin/init-store.sh Sources preload script before store initialization.
hugegraph-server/hugegraph-dist/src/assembly/static/bin/common-topling.sh New shared functions for extracting/preloading libs/resources and compat workarounds.
hugegraph-server/hugegraph-core/pom.xml Updates jraft version property.
hugegraph-rocksdb-provider/src/main/resources/META-INF/services/org.apache.hugegraph.rocksdb.provider.RocksDBProvider Registers Standard and Topling providers for SPI discovery.
hugegraph-rocksdb-provider/src/main/java/org/apache/hugegraph/rocksdb/provider/ToplingRocksDBProvider.java Implements Topling SidePluginRepo reflection, YAML validation, and optional HTTP start.
hugegraph-rocksdb-provider/src/main/java/org/apache/hugegraph/rocksdb/provider/StandardRocksDBProvider.java Implements standard RocksDB open/close behavior.
hugegraph-rocksdb-provider/src/main/java/org/apache/hugegraph/rocksdb/provider/RocksDBProviderLoader.java Adds provider discovery, priority selection, and open/close delegates.
hugegraph-rocksdb-provider/src/main/java/org/apache/hugegraph/rocksdb/provider/RocksDBProvider.java Defines SPI interface for provider open/close operations.
hugegraph-rocksdb-provider/src/main/java/org/apache/hugegraph/rocksdb/provider/AbstractRocksDBProvider.java Adds base implementation for provider lifecycle and close handling.
hugegraph-rocksdb-provider/pom.xml New module POM and dependency declarations for provider layer.
hugegraph-pd/hg-pd-dist/src/assembly/static/conf/rocksdb_pd.yaml Adds sample Topling/RocksDB YAML config for PD.
hugegraph-pd/hg-pd-dist/src/assembly/static/conf/application.yml Documents optional rocksdb option-path/open-http config in PD dist.
hugegraph-pd/hg-pd-dist/src/assembly/static/bin/util.sh Adds helper to locate server dist dir for preload integration.
hugegraph-pd/hg-pd-dist/src/assembly/static/bin/start-hugegraph-pd.sh Sources server preload script when available before starting PD.
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/store/HgKVStoreImpl.java Opens/closes RocksDB via provider loader and threads option/http configs through.
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/config/PDConfig.java Adds Spring-configurable rocksdb.option-path and rocksdb.open-http properties.
hugegraph-pd/hg-pd-core/pom.xml Updates dependencies to use provider module and updates jraft version.
hugegraph-pd/hg-pd-cli/pom.xml Updates jraft version.
docs/toplingdb/toplingdb.md Adds ToplingDB feature/design/configuration documentation.
docs/toplingdb/toplingdb-troubleshooting.md Adds troubleshooting guide for YAML/HTTP/lock issues.
docs/toplingdb/toplingdb-security.md Adds security hardening recommendations for ToplingDB usage.
docs/toplingdb/toplingdb-operations.md Adds operational guidance (monitoring/tuning/upgrade).
.specs/hugegraph-server/ToplingDB/task.md Adds implementation task breakdown for ToplingDB work.
.specs/hugegraph-server/ToplingDB/requirements.md Adds requirements spec for ToplingDB support.
.specs/hugegraph-server/ToplingDB/design.md Adds design spec describing preload and open logic.
.github/workflows/server-ci.yml Updates CI to package artifacts, install deps, and enable GitHub Packages access.
.github/workflows/pd-store-ci.yml Updates CI packaging/deps/preload flow and enables GitHub Packages access.
.github/workflows/licence-checker.yml Enables stage repo usage in workflow env.
.github/workflows/commons-ci.yml Enables stage repo usage and adds GitHub Packages env vars.
.github/workflows/codeql-analysis.yml Enables stage repo usage and adds GitHub Packages permissions/env vars.
.github/workflows/cluster-test-ci.yml Enables stage repo usage and adds GitHub Packages env vars.
.github/workflows/check-dependencies.yml Enables stage repo usage and adds GitHub Packages env vars.
.github/configs/settings.xml Adds GitHub Packages repo for ToplingDB snapshot dependency resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hugegraph-rocksdb-provider/pom.xml
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review leaves 5 blocking concerns to resolve before merge: ToplingDB currently leaks into the default RocksDB build/startup path, and the platform fallback, Store HTTP lifecycle, explicit config failure semantics, and snapshot/CF lifecycle paths are not fully provider-aware. I verified the touched provider/rocksdb modules compile locally, so the remaining concerns are design/runtime coverage issues rather than simple compilation failures.

Test coverage required before merge

The current test delta is not enough for a storage backend/provider change of this size. Please add tests in priority order so the fix remains rigorous and controllable:

P0 merge blockers
  ├─ explicit Topling config failure semantics
  ├─ unsupported platform fallback
  ├─ Store multi-session open_http lifecycle
  └─ provider-aware restart / snapshot lifecycle

P1 provider contract
  ├─ provider selection and fallback ordering
  ├─ close / cleanup behavior
  └─ invalid path / invalid YAML handling

P2 packaging and script guards
  ├─ preload is opt-in or correctly gated
  └─ no accidental Topling dependency on standard RocksDB startup

P0 - must cover before merge

  1. Explicit rocksdb.option_path failure must fail fast

    • Missing file, unreadable file, invalid YAML, and path outside the allowed config root.
    • Expected result: if users explicitly configure option_path, startup fails clearly instead of silently falling back to standard RocksDB.
  2. Unsupported platform fallback

    • At minimum, assert the platform gate for Linux arm64/aarch64 does not enter Topling preload/provider selection.
    • Expected result: non-supported OS/arch uses standard RocksDB and does not preload librocksdbjni-linux64.so.
  3. Store multi-session open_http lifecycle

    • Create/open more than one Store RocksDBSession with rocksdb.open_http=true.
    • Expected result: Topling HTTP server starts at most once, or Store rejects the ambiguous configuration with a deterministic error.
  4. Provider-aware lifecycle paths

    • Existing Topling DB restart: CF discovery must work through a provider-aware path.
    • Snapshot verify/load: read-only verification must initialize Topling config before opening the snapshot.
    • Expected result: normal open, restart, snapshot verify, and snapshot restore all follow the same provider contract.

P1 - provider contract tests

  1. RocksDBProviderLoader selection and fallback

    • Topling available + configured -> Topling selected.
    • Topling unavailable + not configured -> standard selected.
    • Topling unavailable + explicitly configured -> fail fast, not silent fallback.
  2. Provider cleanup

    • Successful Topling open then close calls provider-specific cleanup (closeAllDB() path).
    • Failed open after partial registration cleans CF handles, RocksDB handle, and repo mapping.

P2 - packaging / script regression tests

  1. Preload script gating

    • Non-Linux exits cleanly.
    • Linux non-x86_64 exits cleanly.
    • Linux x86_64 only proceeds when Topling is intentionally enabled.
  2. Standard RocksDB startup remains independent

    • With no rocksdb.option_path, standard RocksDB startup should not require Topling native resources, GitHub Packages availability, or runtime downloads.

Without these tests, CI only proves a narrow compile/happy-path case. It does not prove that the new storage provider is safe across fallback, restart, restore, or multi-DB Store scenarios.

Comment thread pom.xml
<hugegraph-commons.version>1.7.0</hugegraph-commons.version>
<lombok.version>1.18.30</lombok.version>
<release.name>hugegraph</release.name>
<rocksdbjni.version>8.10.2-SNAPSHOT</rocksdbjni.version>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Default RocksDB now depends on a Topling snapshot

The current dependency flow makes ToplingDB part of the default RocksDB path, not an opt-in provider:

standard RocksDB user
  -> root pom selects rocksdbjni:8.10.2-SNAPSHOT
  -> build needs GitHub Packages
  -> Linux startup scripts source Topling preload logic
  -> native preload/runtime assumptions affect default startup

Evidence

  • This line changes the global rocksdbjni.version to 8.10.2-SNAPSHOT.
  • .github/configs/settings.xml adds GitHub Packages as the source for that snapshot.
  • start-hugegraph.sh / init-store.sh source preload-topling.sh unconditionally on the server startup path.

Impact
Users who did not configure rocksdb.option_path can still depend on a mutable snapshot artifact and Topling native preload behavior. That is risky for normal contributor builds, offline deployments, and release reproducibility.

Suggested fix
Keep the default dependency on a released standard rocksdbjni artifact. Select the Topling snapshot only through an explicit profile/configuration path, and only enter native preload when Topling is explicitly enabled.

local dest_dir="$2"
local os_name

# NOTE: The current ToplingDB rocksdbjni snapshot bundles Linux x86_64 native libraries.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Linux arm64 does not actually fall back to standard RocksDB

The comment says the current ToplingDB snapshot bundles Linux x86_64 native libraries, but the implementation only checks the OS:

Linux arm64/aarch64
  -> os_name == Linux
  -> extract .so from rocksdbjni snapshot
  -> preload librocksdbjni-linux64.so
  -> Java provider may still select Topling by class presence

Evidence

  • This script skips only non-Linux platforms.
  • It later preloads librocksdbjni-linux64.so without checking uname -m.
  • ToplingRocksDBProvider.isAvailable() only checks for org.rocksdb.SidePluginRepo, not CPU architecture.
  • The root POM only has a macOS profile that falls back to standard rocksdbjni; Linux arm64 still gets the Topling snapshot.

Impact
Linux arm64/aarch64 will not follow the documented fallback behavior. It can try to load x86_64 native libraries and fail during startup instead of using standard RocksDB.

Suggested fix
Add the same platform boundary in all three places:

Maven dependency selection: Linux x86_64 -> Topling snapshot, otherwise standard rocksdbjni
Shell preload: return unless uname -s == Linux and uname -m is x86_64/amd64
Java provider availability: return false on unsupported OS/arch

Please also add at least one fallback check for Linux arm64 so this does not regress.

this.rocksDB = RocksDB.open(dbOptions, dbPath, columnFamilyDescriptorList,
columnFamilyHandleList);
this.rocksDB =
RocksDBProviderLoader.openRocksDB(dbOptions, dbPath, columnFamilyDescriptorList,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Store can start the Topling HTTP server once per RocksDB session

Server already gates open_http to one logical DB, but Store passes the configured flag into every RocksDBSession:

RocksDBFactory
  -> createGraphDB(db1) -> RocksDBSession -> startHttpServer()
  -> createGraphDB(db2) -> RocksDBSession -> startHttpServer()
  -> same YAML listening_ports

Evidence

  • This call forwards hugeConfig.get(RocksDBOptions.OPEN_HTTP) directly.
  • RocksDBFactory.createGraphDB() can create multiple sessions in the same Store process.
  • ToplingRocksDBProvider.startHttpServerIfNeeded() calls startHttpServer() whenever openHttp is true.
  • Server code has a GRAPH_STORE gate, but Store has no equivalent process-level gate.

Impact
Multiple Store RocksDB instances can compete for the same Topling HTTP port. Startup then depends on open order and may fail partway through DB initialization.

Suggested fix
Move Topling HTTP lifecycle to a process-level singleton, or add an explicit Store-side gating rule so only one chosen DB can start the HTTP server. Please cover the multi-session case in a regression test.

LOG.warn("SidePluginRepo not found, even though 'optionPath' was provided. " +
"Falling back to the standard RocksDB default CF opening method. " +
"The configuration in '{}' will be ignored.", optionPath);
} catch (Exception e) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Explicit Topling config errors should fail fast, not silently fall back

When users configure rocksdb.option_path, the current behavior treats invalid Topling config as a warning and then opens standard RocksDB:

rocksdb.option_path is set
  -> path/YAML validation fails
  -> warn only
  -> validateConfiguration() returns false
  -> standard RocksDB opens successfully

Evidence

  • Path traversal, missing file, unreadable file, and invalid YAML are all caught here as generic exceptions.
  • The method then returns false, which makes the caller use the standard RocksDB path.
  • The provider module also hard-codes ./conf/ as the allowed deployment directory.

Impact
A typo or malformed YAML can make the service start successfully while not using ToplingDB at all. That is dangerous operationally because the user explicitly requested Topling behavior but gets a silent downgrade.

Suggested fix
Only allow fallback when Topling was not requested, or when the platform is explicitly unsupported by design. If option_path is explicitly configured and validation fails, startup should fail fast with a clear error. Path policy should be resolved in Server/PD/Store config layers and passed to the provider as a normalized path, rather than hard-coded in this shared provider.

this.rocksDB = RocksDB.open(dbOptions, dbPath, columnFamilyDescriptorList,
columnFamilyHandleList);
this.rocksDB =
RocksDBProviderLoader.openRocksDB(dbOptions, dbPath, columnFamilyDescriptorList,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Snapshot verification still bypasses the provider path

Normal open now goes through RocksDBProviderLoader, but this recovery path still uses raw RocksDB APIs:

Topling DB / snapshot
  -> verifySnapshot()
  -> RocksDB.listColumnFamilies()
  -> RocksDB.openReadOnly()
  -> no option_path / SidePluginRepo initialization

Evidence

  • Snapshot verification calls RocksDB.listColumnFamilies() directly here.
  • It then calls RocksDB.openReadOnly() directly below.
  • Store startup has a similar pre-open direct listColumnFamilies() path, and Server listCFs() also bypasses the provider.

Impact
If a Topling database or snapshot relies on YAML-defined factories/options, restart or snapshot restore can fail before the provider gets a chance to import the Topling configuration. This leaves lifecycle paths inconsistent with the new normal open path.

Suggested fix
Add provider-aware helpers for CF listing and read-only snapshot verification. At minimum, when rocksdb.option_path is set, initialize/import the Topling config before CF discovery and read-only verification. Please add regression coverage for Topling DB restart and snapshot verify/load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Incompatible dependencies of package feature New feature pd PD module size:XXL This PR changes 1000+ lines, ignoring generated files. store Store module

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

4 participants