Skip to content

feat: introduce Module framework for persistent storage modules#407

Closed
zhanglei1949 wants to merge 2 commits into
alibaba:mainfrom
zhanglei1949:zl/module-framework
Closed

feat: introduce Module framework for persistent storage modules#407
zhanglei1949 wants to merge 2 commits into
alibaba:mainfrom
zhanglei1949:zl/module-framework

Conversation

@zhanglei1949
Copy link
Copy Markdown
Member

@zhanglei1949 zhanglei1949 commented May 26, 2026

Summary

Adds the Module abstract interface, ModuleFactory singleton registry, and ModuleDescriptor metadata struct — infrastructure for the upcoming checkpoint / snapshot store work to persist and restore extensible storage modules in a type-erased way.

This PR is pure addition with no caller. Reviewers are asked to look at the API shape rather than usage; the callers (the four Module-inheriting base classes and the Checkpoint / SnapshotStore that drive them) land in the follow-up workspace / checkpoint PR.

This is PR-B1 of the #370 split — it can be reviewed in parallel with #401 (PR-A: storage view layer refactor), since the two are independent.

What's in the framework

  • Module (include/neug/storages/module/module.h) — abstract base with four lifecycle hooks: Open(Checkpoint&, ModuleDescriptor&, MemoryLevel), Dump(Checkpoint&) -> ModuleDescriptor, Close(), Fork(Checkpoint&, MemoryLevel) -> unique_ptr<Module>. Both Dump and Fork are expected to write into a UUID sub-directory under the checkpoint runtime dir.
  • ModuleFactory (include/neug/storages/module/module_factory.h, src/storages/module/module_factory.cc) — singleton registry mapping module_type strings to creator functions. NEUG_REGISTER_MODULE(Class) / NEUG_REGISTER_TEMPLATE_MODULE(Tmpl, T) macros perform static-init registration; the template variant uses __COUNTER__ so types containing :: (e.g. std::string_view) still get a stable unique function identifier.
  • ModuleDescriptor (include/neug/storages/module_descriptor.h, src/storages/graph/module_descriptor.cc) — metadata struct round-trippable through rapidjson, carrying path / size / module_type plus arbitrary extra KV pairs and recursive sub-module descriptors. Pimpl is used for the sub-module map to avoid the incomplete-type issue of a self-referential unordered_map inside the struct definition.
  • StorageTypeName<T> (include/neug/storages/module/type_name.h) — trait mapping storage value types (int32_t, std::string_view, Date, ...) to short factory key suffixes.
  • UUIDGenerator (include/neug/utils/uuid.h, src/utils/uuid.cc) — small RFC-4122-style UUID generator used to name per-Dump / per-Fork sub-directories.

Wiring

  • src/storages/module/CMakeLists.txt — new neug_storages_module OBJECT library (GLOB).
  • src/storages/CMakeLists.txt — adds add_subdirectory(module) and wires the new OBJECT files into NEUG_STORAGES_OBJFILES.
  • module_descriptor.cc is placed under src/storages/graph/ (and picked up by the existing GLOB there) because it instantiates the pimpl that wraps a self-referential map of ModuleDescriptors — keeping it in the same translation unit family avoids a circular dependency between neug_storages_module and neug_property_graph.
  • uuid.cc is placed under src/utils/ (existing GLOB picks it up).

No changes to existing files other than the two-line src/storages/CMakeLists.txt edit.

Review focus

  • API shape: are the four Module lifecycle hooks the right surface, given they will be driven by Checkpoint / SnapshotStore in the follow-up?
  • ModuleDescriptor shape: extra_ (free-form KV) + sub_modules_ (recursive) + JSON round-trip is enough metadata for the downstream callers; reviewers may want to push back if a flatter shape would do.
  • Registration macros: __attribute__((constructor))-based static-init registration is intentional (so callers don't need an explicit init step). The __COUNTER__ indirection in NEUG_REGISTER_TEMPLATE_MODULE is for ::-containing type names — please double-check it works on your toolchain.

Fixes #408

Add the Module abstract interface, ModuleFactory singleton registry, and
ModuleDescriptor metadata struct. These provide the infrastructure that
the upcoming checkpoint / snapshot store work will use to persist and
restore extensible storage modules in a type-erased way.

Module declares four lifecycle hooks (Open / Dump / Close / Fork) that
operate against a Checkpoint instance. ModuleFactory maps short
type-name strings to creator functions for deserialization, with
NEUG_REGISTER_MODULE / NEUG_REGISTER_TEMPLATE_MODULE macros for
static-init registration. ModuleDescriptor carries the metadata required
to round-trip a module through JSON, including arbitrary key-value
extras and recursive sub-module descriptors.

Also adds a small UUIDGenerator utility used to name per-Dump and
per-Fork sub-directories under the checkpoint runtime dir, and a
StorageTypeName trait that maps storage value types to factory key
suffixes.

This PR introduces the framework only; no callers yet. The four base
classes (CsrBase, VertexTimestamp, LFIndexer, ColumnBase) that inherit
Module and the checkpoint / snapshot_store that drives them land in the
follow-up workspace / checkpoint PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce Module framework for persistent storage modules

1 participant