refactor: Introduce Module, Checkpoint CheckpointManager for better management of database directory#148
Conversation
ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan Review Summary by QodoRefactor storage management with Module abstraction, Workspace/Checkpoint framework, and ModuleFactory registry
WalkthroughsDescription• Introduces comprehensive module abstraction layer with Module base class defining unified lifecycle interface (Open, Dump, Close, Fork) for all storage components • Implements ModuleFactory singleton with type-erased instantiation and registration mechanism via NEUG_REGISTER_MODULE macro • Introduces Workspace and Checkpoint abstractions for managing numbered checkpoint directories with snapshot/runtime/wal subdirectories • Implements ModuleDescriptor metadata structure with JSON serialization for module inventory tracking • Introduces SnapshotMeta module managing checkpoint's module inventory and persisting metadata as JSON • Refactors PropertyGraph, VertexTable, EdgeTable, Schema, CsrBase, ColumnBase, and LFIndexer to inherit from Module and implement the unified lifecycle interface • Registers 78+ template specializations (MutableCsr, ImmutableCsr, TypedColumn) with ModuleFactory covering all data types • Adds StorageTypeName template trait for C++ type to string mapping with macro-based specialization • Refactors NeugDB to use Workspace for checkpoint discovery and opens graphs from latest checkpoint • Updates all storage tests and benchmarks to use new Checkpoint-based API instead of legacy directory management • Creates new CMake target neug_storages_module for module factory and base classes Diagramflowchart LR
A["Module Base Class<br/>Open/Dump/Close/Fork"] --> B["ModuleFactory<br/>Registry & Creation"]
C["Workspace<br/>Checkpoint Management"] --> D["Checkpoint<br/>Numbered Directories"]
D --> E["SnapshotMeta<br/>Module Inventory"]
E --> F["Schema<br/>VertexTable<br/>EdgeTable"]
F --> G["CsrBase<br/>ColumnBase<br/>LFIndexer"]
G --> B
H["PropertyGraph"] --> C
H --> A
I["NeugDB"] --> H
File Changes1. include/neug/storages/module/module.h
|
Code Review by Qodo
1. Checkpoint restore drops data
|
516e96f to
5bc7c6d
Compare
b0c0529 to
382166b
Compare
ac894ec to
347749a
Compare
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
fd47ddd to
2e9e53a
Compare
46823b4 to
96eb6ed
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Refactors persistent storage management to revolve around Workspace + Checkpoint + ModuleDescriptor (module-based Open/Dump/Fork), and updates graph/table/CSR codepaths and tests accordingly (Fix #133).
Changes:
- Introduce checkpointed workspace layout (
Workspace,Checkpoint,SnapshotMeta,ModuleDescriptor) and module registry (ModuleFactory). - Migrate
PropertyGraph,VertexTable,EdgeTable,Table/Column, and CSR implementations to the new module-driven Open/Dump/Fork API. - Update/extend unit & storage tests for the new checkpoint/module workflow and add new fork/dump matrix tests.
Reviewed changes
Copilot reviewed 67 out of 67 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/utils/test_table.cc | Updates table tests to use Workspace/Checkpoint and descriptors instead of hardcoded dirs. |
| tests/unittest/utils.h | Adds standard library includes needed by updated unit test helpers. |
| tests/unittest/test_column.cc | New typed tests for column fork isolation + dump/open matrix. |
| tests/unittest/logical_delete_test.cc | Migrates PropertyGraph open to checkpoint-based API. |
| tests/unittest/CMakeLists.txt | Adds new test_column target. |
| tests/storage/vertex_table_benchmark.cc | Updates benchmark to use checkpoint/module descriptors. |
| tests/storage/test_vertex_table.cc | Migrates vertex table tests to checkpoint-based Open/Dump. |
| tests/storage/test_set_property.cc | Includes updated unittest utils header. |
| tests/storage/test_property_graph.cc | Migrates property graph test fixture to checkpoint-based Open. |
| tests/storage/test_open_graph.cc | Includes updated unittest utils header and tweaks logging. |
| tests/storage/test_mutable_csr.cc | Adds fork/dump tests and migrates CSR tests to checkpoint-based API. |
| tests/storage/test_immutable_csr.cc | Migrates immutable CSR tests to checkpoint-based API and adds workspace usage. |
| tests/storage/test_export.cc | Includes updated unittest utils header. |
| tests/storage/test_ddl.cc | Includes updated unittest utils header. |
| tests/storage/test_csr_stream_ops.cc | Migrates CSR stream tests to checkpoint-based API and workspace-managed dirs. |
| tests/storage/test_csr_batch_ops.cc | Migrates CSR batch tests to checkpoint-based Open/Dump. |
| tests/storage/alter_property_test.cc | Migrates alter-property test to open graph via checkpoint. |
| tests/execution/test_value.cc | Minor formatting cleanup for EXPECT_DEATH. |
| src/utils/uuid.cc | Adds UUID generator implementation. |
| src/utils/property/table.cc | Refactors Table to module-based Open/Dump/Fork and registers column module types. |
| src/utils/id_indexer.cc | Registers LF indexer module type. |
| src/storages/module/module_factory.cc | Adds module factory implementation and registry. |
| src/storages/module/CMakeLists.txt | Adds object library for module sources. |
| src/storages/loader/abstract_property_graph_loader.cc | Loader now creates a workspace+checkpoint and uses checkpoint meta schema. |
| src/storages/graph/workspace.cc | New workspace implementation that discovers/creates checkpoints. |
| src/storages/graph/vertex_timestamp.cc | Migrates vertex timestamp to module-based Open/Dump/Fork and registers module. |
| src/storages/graph/vertex_table.cc | Migrates vertex table to module-based Open/Dump and descriptor submodules. |
| src/storages/graph/snapshot_meta.cc | Adds JSON-persisted snapshot meta (schema + module inventory). |
| src/storages/graph/schema.cc | Adds Schema JSON (de)serialization via YAML<->JSON conversion and fixes edge-strategy labels. |
| src/storages/graph/property_graph.cc | Migrates graph Open/Dump to checkpoint/meta-driven module open. |
| src/storages/graph/module_descriptor.cc | Implements ModuleDescriptor JSON and submodule tree handling. |
| src/storages/graph/edge_table.cc | Migrates edge table to module-based Open/Dump and property mutations using checkpoint. |
| src/storages/graph/checkpoint.cc | New checkpoint implementation (dirs, meta persistence, commit semantics, cleanup). |
| src/storages/csr/mutable_csr.cc | Migrates mutable CSR variants to module API, implements descriptor-based dump/open, registers module types. |
| src/storages/csr/immutable_csr.cc | Migrates immutable CSR variants to module API, implements dump/open, registers module types. |
| src/storages/container/mmap_container.cc | Improves mmap resize error and adds Fork for containers. |
| src/storages/container/container_utils.cc | Improves error logging when tmp path is missing in disk-backed mode. |
| src/storages/CMakeLists.txt | Adds module/ subdir into storages build. |
| src/server/neug_db_service.cc | Updates session pool init after WAL path handling changes. |
| src/main/neug_db.cc | Migrates DB open/ingest/checkpoint flow to workspace+checkpoint model. |
| include/neug/utils/uuid.h | Declares UUIDGenerator. |
| include/neug/utils/property/table.h | Updates Table API to checkpoint/module descriptor and adds Fork. |
| include/neug/utils/property/column.h | Migrates columns to Module interface and implements Open/Dump/Fork/type names. |
| include/neug/utils/id_indexer.h | Migrates LFIndexer to Module interface and adds Open/Dump/Fork. |
| include/neug/storages/workspace.h | Declares Workspace API. |
| include/neug/storages/snapshot_meta.h | Declares SnapshotMeta (schema + modules) and key helpers. |
| include/neug/storages/module_descriptor.h | Declares ModuleDescriptor with JSON + submodules. |
| include/neug/storages/module/type_name.h | Adds StorageTypeName<> trait for module type strings. |
| include/neug/storages/module/module_factory.h | Declares ModuleFactory + registration macro. |
| include/neug/storages/module/module.h | Declares the module lifecycle interface. |
| include/neug/storages/loader/abstract_property_graph_loader.h | Loader now owns a Workspace and seeds checkpoint meta schema. |
| include/neug/storages/graph/vertex_timestamp.h | VertexTimestamp now implements Module. |
| include/neug/storages/graph/vertex_table.h | VertexTable migrated to descriptor-based Open/Dump; indexer is now a pointer. |
| include/neug/storages/graph/schema.h | Adds Schema JSON API declarations. |
| include/neug/storages/graph/property_graph.h | Switches graph open/dump API to checkpoint-based, stores Checkpoint*. |
| include/neug/storages/graph/edge_table.h | Migrates EdgeTable API to checkpoint-based Open/Dump/Close and property mutations. |
| include/neug/storages/file_names.h | Removes temp_checkpoint_dir helper (no longer used with checkpoint model). |
| include/neug/storages/csr/mutable_csr.h | Migrates CSR base implementations to module API and adds Fork/type names. |
| include/neug/storages/csr/immutable_csr.h | Migrates immutable CSR to module API and defines module type names. |
| include/neug/storages/csr/csr_base.h | Makes CSR base inherit Module and adds fork_vertex hook. |
| include/neug/storages/container/mmap_container.h | Declares container Fork. |
| include/neug/storages/container/i_container.h | Adds container Fork virtual method. |
| include/neug/storages/checkpoint.h | Declares Checkpoint class and helpers (commit/meta/containers). |
| include/neug/server/session_pool.h | SessionPool now derives WAL location from graph.checkpoint(). |
| include/neug/main/neug_db.h | NeugDB::work_dir() now returns workspace db dir; adds Workspace member. |
Comments suppressed due to low confidence (12)
src/storages/graph/property_graph.cc:1
SnapshotMeta::edge_table_keyis declared as(src_label, edge_label, dst_label)ininclude/neug/storages/snapshot_meta.h, but here it’s called as(src_label, dst_label, edge_label). This will break meta lookups (and likely won’t compile if overloads don’t match). Align the argument order either in the helper signature or at all call sites.
src/storages/graph/vertex_table.cc:1Tablewas refactored toOpen/Dump/Forkand no longer shows aclose()API in the updated header diff, sotable_->close()is very likely a compile error. Either reintroduce aTable::Close()method (closing all columns’ modules) and call that, or update all callers to the new lifecycle method name consistently.
src/storages/graph/edge_table.cc:1- Same issue as VertexTable:
table_->close()is inconsistent with the refactoredTableAPI and is likely missing now. Consider adding and usingTable::Close()consistently from both VertexTable and EdgeTable.
src/utils/property/table.cc:1 add_columnsignores thememory_levelargument and always opens new columns askInMemory. This changes behavior for disk-backed or hugepage configurations. Use the providedmemory_level(and a stable per-column sub-descriptor/path strategy) so added columns follow the table’s storage policy.
src/utils/property/table.cc:1add_columnsignores thememory_levelargument and always opens new columns askInMemory. This changes behavior for disk-backed or hugepage configurations. Use the providedmemory_level(and a stable per-column sub-descriptor/path strategy) so added columns follow the table’s storage policy.
src/storages/graph/vertex_timestamp.cc:1- When
descis default-constructed,desc.pathis empty;std::filesystem::exists(\"\")can resolve to the current directory on some platforms/libstdc++ versions, causing an unintendedload_tscall with an empty path. Guard explicitly with!desc.path.empty()before checkingexists().
tests/utils/test_table.cc:1 - This uses
::getpid()but the file diff doesn’t show an include for<unistd.h>(or an equivalent platform header). This can fail to compile on some toolchains. Add the appropriate include or replacegetpid()usage with a portable alternative.
tests/storage/vertex_table_benchmark.cc:1 VertexTable::Openexpects theModuleDescriptorto contain sub-modules (indexer,property_table,vertex_timestamp). Providing onlymodule_type+pathwill result in empty sub-descriptors and can cause runtime failures (e.g., LFIndexer needing key type). For a benchmark, either (1) create a new empty table by passing a default descriptor but ensuring the indexer is initialized with pk type and opened accordingly, or (2) load a real descriptor produced byVertexTable::Dump.
src/storages/graph/checkpoint.cc:1- This constructs a new
UUIDGenerator(and thus a newstd::random_device/PRNG) on every loop iteration. Consider reusing a static/shared generator (e.g.,generate_uuid()already uses a static generator) to reduce overhead, especially since this is called frequently for runtime object creation.
src/utils/uuid.cc:1 <regex>is included but not used in this translation unit. Removing unused includes reduces compile time and avoids confusion about intended validation logic.
src/storages/graph/checkpoint.cc:1- Checkpoint meta updates now drive deletion of orphaned snapshot files. This is correctness-critical and potentially destructive; please add focused tests that (1) write meta referencing a subset of UUID files, (2) call
UpdateMeta, and (3) verify only unreferenced UUID files are deleted while referenced ones remain.
src/storages/graph/checkpoint.cc:1 - Checkpoint meta updates now drive deletion of orphaned snapshot files. This is correctness-critical and potentially destructive; please add focused tests that (1) write meta referencing a subset of UUID files, (2) call
UpdateMeta, and (3) verify only unreferenced UUID files are deleted while referenced ones remain.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
de32e4d to
31c325f
Compare
5d82878 to
e5202af
Compare
refine module_store and module_desc refactor snapshot_meta/checkpoint, remove unncessary changes remove module store minor ref ref wal apply refine refine checkpoint manage fix ci minor refine fix minor refine refine module register macro fix format rm file_names.h minor refine minor refine supporing fast dump remove some cmments TBD: add dirty flag for MutableCSR, to avoid redumping the data minor fix remove method generate_uuid fix test revert is_dirty_ flag and compute md5 for mutable_csr's nbr_list before dump fix test check whether data_buffer is modified at column level refine remove unused field last_descriptor cherry-pick c964e58 fix issues reported by aone copilot two fold check remove some comment revert changes to indexer_ in VertexTable minor Committed-by: xiaolei.zl from Dev container Committed-by: xiaolei.zl from Dev container refine test cases fix iterate edge schemas from schema Remove dummy function remove fork related code, separate module open from v/etable construction fix format fix test
fix fix pytest remove dead code rename func make indexer's indices column, not raw container format
3c6c186 to
f1314d8
Compare
Fix #133
FIx #125
Fix #271
Fix #262
This pull request introduces significant refactoring and modernization of the storage and checkpointing system, particularly around the CSR (Compressed Sparse Row) graph data structures and the database workspace management. The changes focus on improving modularity, encapsulation, and type safety, as well as unifying the interface for storage modules. The most notable changes are the introduction of a new
Checkpointabstraction, updates to the CSR classes to use a unified module interface, and removal of legacy methods.Storage and Checkpoint System Modernization:
Checkpointclass (include/neug/storages/checkpoint.h) to encapsulate all logic and metadata for managing checkpoint directories, snapshotting, and write-ahead logs. This class provides thread-safe operations and a clear API for persistent storage management.NeugDBto use aWorkspaceobject instead of a rawwork_dir_string, and refactored related methods to leverage the new workspace and checkpoint abstractions. [1] [2] [3] [4]CSR Graph Storage Refactoring:
CsrBase,ImmutableCsr,SingleImmutableCsr,MutableCsr,SingleMutableCsr) to inherit from a newModuleinterface, and replaced legacy open/dump/close methods with a unifiedOpen,Dump, andCloseinterface that works with the newCheckpointandModuleDescriptortypes. [1] [2] [3] [4] [5] [6] [7]ModuleTypeName,type_name) to CSR classes for improved type safety and module management. [1] [2] [3]API and Code Cleanliness Improvements:
Other Minor Improvements:
GetHeaderhelper toMMapContainerfor safer access to file headers.These changes collectively modernize the storage layer, improve maintainability, and lay groundwork for future enhancements in persistence and modularity.
Open & Close LDBC SNB sf300 takes 13mins