Skip to content

refactor: Introduce Module, Checkpoint CheckpointManager for better management of database directory#148

Merged
zhanglei1949 merged 19 commits into
alibaba:mainfrom
zhanglei1949:zl/workspace-ref
Jun 4, 2026
Merged

refactor: Introduce Module, Checkpoint CheckpointManager for better management of database directory#148
zhanglei1949 merged 19 commits into
alibaba:mainfrom
zhanglei1949:zl/workspace-ref

Conversation

@zhanglei1949
Copy link
Copy Markdown
Member

@zhanglei1949 zhanglei1949 commented Mar 31, 2026

Fix #133
FIx #125
Fix #271
Fix #262
This pull request introduces significant refactoring and modernization of the storage and checkpointing system, particularly around the CSR (Compressed Sparse Row) graph data structures and the database workspace management. The changes focus on improving modularity, encapsulation, and type safety, as well as unifying the interface for storage modules. The most notable changes are the introduction of a new Checkpoint abstraction, updates to the CSR classes to use a unified module interface, and removal of legacy methods.

Storage and Checkpoint System Modernization:

  • Introduced a new Checkpoint class (include/neug/storages/checkpoint.h) to encapsulate all logic and metadata for managing checkpoint directories, snapshotting, and write-ahead logs. This class provides thread-safe operations and a clear API for persistent storage management.
  • Updated NeugDB to use a Workspace object instead of a raw work_dir_ string, and refactored related methods to leverage the new workspace and checkpoint abstractions. [1] [2] [3] [4]

CSR Graph Storage Refactoring:

  • Refactored all CSR base and derived classes (CsrBase, ImmutableCsr, SingleImmutableCsr, MutableCsr, SingleMutableCsr) to inherit from a new Module interface, and replaced legacy open/dump/close methods with a unified Open, Dump, and Close interface that works with the new Checkpoint and ModuleDescriptor types. [1] [2] [3] [4] [5] [6] [7]
  • Added type introspection methods (ModuleTypeName, type_name) to CSR classes for improved type safety and module management. [1] [2] [3]

API and Code Cleanliness Improvements:

  • Removed legacy and redundant open/close/dump methods from CSR classes, consolidating them into the new modular interface. [1] [2] [3] [4] [5] [6] [7] [8]
  • Updated session pool initialization to use the new checkpoint-based WAL directory, improving configuration consistency.

Other Minor Improvements:

  • Added header includes for new dependencies and type utilities. [1] [2] [3]
  • Added a GetHeader helper to MMapContainer for safer access to file headers.

These changes collectively modernize the storage layer, improve maintainability, and lay groundwork for future enhancements in persistence and modularity.

Open & Close LDBC SNB sf300 takes 13mins

@zhanglei1949 zhanglei1949 marked this pull request as draft March 31, 2026 06:10
@qodo-code-review
Copy link
Copy Markdown

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

Review Summary by Qodo

Refactor storage management with Module abstraction, Workspace/Checkpoint framework, and ModuleFactory registry

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Introduces comprehensive module abstraction layer with Module base class defining unified
  lifecycle interface (Open, Dump, Close, Fork) for all storage components
• Implements ModuleFactory singleton with type-erased instantiation and registration mechanism via
  NEUG_REGISTER_MODULE macro
• Introduces Workspace and Checkpoint abstractions for managing numbered checkpoint directories
  with snapshot/runtime/wal subdirectories
• Implements ModuleDescriptor metadata structure with JSON serialization for module inventory
  tracking
• Introduces SnapshotMeta module managing checkpoint's module inventory and persisting metadata as
  JSON
• Refactors PropertyGraph, VertexTable, EdgeTable, Schema, CsrBase, ColumnBase, and
  LFIndexer to inherit from Module and implement the unified lifecycle interface
• Registers 78+ template specializations (MutableCsr, ImmutableCsr, TypedColumn) with ModuleFactory
  covering all data types
• Adds StorageTypeName template trait for C++ type to string mapping with macro-based
  specialization
• Refactors NeugDB to use Workspace for checkpoint discovery and opens graphs from latest
  checkpoint
• Updates all storage tests and benchmarks to use new Checkpoint-based API instead of legacy
  directory management
• Creates new CMake target neug_storages_module for module factory and base classes
Diagram
flowchart LR
  A["Module Base Class<br/>Open/Dump/Close/Fork"] --> B["ModuleFactory<br/>Registry & Creation"]
  C["Workspace<br/>Checkpoint Management"] --> D["Checkpoint<br/>Numbered Directories"]
  D --> E["SnapshotMeta<br/>Module Inventory"]
  E --> F["Schema<br/>VertexTable<br/>EdgeTable"]
  F --> G["CsrBase<br/>ColumnBase<br/>LFIndexer"]
  G --> B
  H["PropertyGraph"] --> C
  H --> A
  I["NeugDB"] --> H
Loading

Grey Divider

File Changes

1. include/neug/storages/module/module.h ✨ Enhancement +106/-0

Abstract Module interface for storage lifecycle management

• Introduces abstract Module base class defining unified lifecycle interface for storage
 components
• Declares four core operations: Open, Dump, Close, and Fork driven by Checkpoint context
• Provides forward declaration of Checkpoint to avoid circular includes
• Includes ModuleTypeName() virtual method for factory registration

include/neug/storages/module/module.h


2. include/neug/storages/module/module_factory.h ✨ Enhancement +147/-0

Module factory registry for dynamic type instantiation

• Introduces ModuleFactory singleton for type-erased Module instantiation
• Provides registration mechanism via NEUG_REGISTER_MODULE macro
• Implements Create, List, and Has methods for module discovery
• Includes ModuleRegistrar RAII helper for static registration

include/neug/storages/module/module_factory.h


3. src/storages/module/module_factory.cc ✨ Enhancement +55/-0

ModuleFactory implementation with registration support

• Implements ModuleFactory singleton instance and registration logic
• Provides Create method to instantiate modules by type name
• Implements List and Has methods for module enumeration

src/storages/module/module_factory.cc


View more (34)
4. include/neug/storages/workspace.h ✨ Enhancement +177/-0

Workspace and Checkpoint abstractions for checkpoint management

• Introduces Checkpoint class representing a single numbered checkpoint directory with
 snapshot/runtime/wal subdirs
• Introduces Workspace class managing multiple numbered checkpoints in a database directory
• Provides checkpoint creation, listing, and retrieval operations
• Includes generate_uuid() helper for creating unique snapshot directories

include/neug/storages/workspace.h


5. src/storages/graph/workspace.cc ✨ Enhancement +125/-0

Workspace and Checkpoint implementation

• Implements Checkpoint directory creation and metadata loading/saving
• Implements Workspace checkpoint enumeration and creation with zero-padded ID formatting
• Provides LatestCheckpoint() to retrieve the most recent checkpoint

src/storages/graph/workspace.cc


6. include/neug/storages/module_descriptor.h ✨ Enhancement +181/-0

ModuleDescriptor metadata structure with JSON serialization

• Introduces ModuleDescriptor struct carrying metadata for Module serialization/deserialization
• Supports extra key-value pairs and nested sub-module descriptors
• Provides JSON serialization via rapidjson with ToJson/FromJson methods
• Uses pimpl pattern for self-referential sub_modules map

include/neug/storages/module_descriptor.h


7. src/storages/graph/module_descriptor.cc ✨ Enhancement +226/-0

ModuleDescriptor serialization and lifecycle implementation

• Implements ModuleDescriptor Rule-of-5 (copy/move constructors and assignment)
• Implements JSON serialization/deserialization with rapidjson
• Implements sub-module accessor methods with lazy initialization

src/storages/graph/module_descriptor.cc


8. include/neug/storages/snapshot_meta.h ✨ Enhancement +144/-0

SnapshotMeta module for checkpoint inventory management

• Introduces SnapshotMeta class as Module that manages checkpoint's module inventory
• Provides canonical key helpers for schema, vertex tables, and edge tables
• Implements JSON persistence at checkpoint meta path

include/neug/storages/snapshot_meta.h


9. src/storages/graph/snapshot_meta.cc ✨ Enhancement +145/-0

SnapshotMeta implementation with JSON persistence

• Implements SnapshotMeta module lifecycle: Open loads JSON, Dump persists JSON
• Implements module map access methods and canonical key generation
• Registers SnapshotMeta with ModuleFactory

src/storages/graph/snapshot_meta.cc


10. include/neug/storages/graph/vertex_table.h ✨ Enhancement +40/-6

VertexTable Module interface integration

• Makes VertexTable inherit from Module base class
• Adds snapshot_dir_ member to track snapshot directory location
• Implements Module interface: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)
• Adds DumpToDir() for legacy flat-directory dumping

include/neug/storages/graph/vertex_table.h


11. src/storages/graph/vertex_table.cc ✨ Enhancement +86/-3

VertexTable Module implementation with Dump/Fork

• Implements Dump(ckp) creating UUID subdirectory under checkpoint runtime dir
• Implements DumpToDir() for legacy flat-directory dumps
• Implements Fork() copying snapshot files and creating independent instance
• Registers VertexTable with ModuleFactory
• Tracks snapshot_dir_ in Open() methods

src/storages/graph/vertex_table.cc


12. include/neug/storages/graph/edge_table.h ✨ Enhancement +31/-4

EdgeTable Module interface integration

• Makes EdgeTable inherit from Module base class
• Adds snapshot_dir_ member to track snapshot directory
• Implements Module interface: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)
• Adds DumpToDir() for legacy flat-directory dumping

include/neug/storages/graph/edge_table.h


13. src/storages/graph/edge_table.cc ✨ Enhancement +57/-1

EdgeTable Module implementation with Dump/Fork

• Implements Dump(ckp) creating UUID subdirectory and returning descriptor
• Implements DumpToDir() for legacy flat-directory dumps
• Implements Fork() forking CSR and table components independently
• Implements Close() method
• Registers EdgeTable with ModuleFactory
• Tracks snapshot_dir_ in move constructor and Swap()

src/storages/graph/edge_table.cc


14. include/neug/storages/graph/property_graph.h ✨ Enhancement +10/-0

PropertyGraph Checkpoint-based Open method

• Adds new Open(ckp, memory_level) overload accepting Checkpoint
• Enables opening graph from checkpoint via Module interface

include/neug/storages/graph/property_graph.h


15. src/storages/graph/property_graph.cc ✨ Enhancement +139/-56

PropertyGraph Checkpoint-based Open and refactored Dump

• Implements Open(ckp, memory_level) loading SnapshotMeta and opening schema/vertex/edge tables
• Refactors Dump() to use Workspace/Checkpoint and SnapshotMeta instead of temp directories
• Replaces temp checkpoint directory logic with UUID-based runtime directories
• Persists SnapshotMeta JSON instead of separate schema file

src/storages/graph/property_graph.cc


16. include/neug/storages/graph/schema.h ✨ Enhancement +32/-1

Schema Module interface methods

• Adds Module interface methods: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)

include/neug/storages/graph/schema.h


17. src/storages/graph/schema.cc ✨ Enhancement +52/-0

Schema Module implementation with binary/YAML persistence

• Implements Open() deserializing binary schema from descriptor path
• Implements Dump() creating UUID directory and writing binary + YAML schema
• Implements Close() as no-op for in-memory object
• Implements Fork() returning copy of schema
• Registers Schema with ModuleFactory

src/storages/graph/schema.cc


18. include/neug/storages/csr/csr_base.h ✨ Enhancement +47/-1

CsrBase Module interface integration

• Makes CsrBase inherit from Module base class
• Implements Module interface: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)
• Adds abstract ForkAsCsr() method for typed fork implementations

include/neug/storages/csr/csr_base.h


19. include/neug/storages/csr/mutable_csr.h ✨ Enhancement +36/-0

MutableCsr Module Fork and type name implementations

• Implements ForkAsCsr() for MutableCsr, SingleMutableCsr, and EmptyCsr templates
• Implements ModuleTypeName() returning type-specific names like "mutable_csr_int64"

include/neug/storages/csr/mutable_csr.h


20. src/storages/csr/mutable_csr.cc ✨ Enhancement +77/-0

MutableCsr ModuleFactory registrations for all types

• Registers 33 CSR template specializations with ModuleFactory (MutableCsr, SingleMutableCsr,
 EmptyCsr)
• Covers all data types: empty, bool, int32/64, uint32/64, float, double, Date, DateTime, Interval

src/storages/csr/mutable_csr.cc


21. src/storages/csr/immutable_csr.cc ✨ Enhancement +55/-0

ImmutableCsr ModuleFactory registrations for all types

• Registers 33 ImmutableCsr template specializations with ModuleFactory
• Covers all data types: empty, bool, int32/64, uint32/64, float, double, Date, DateTime, Interval

src/storages/csr/immutable_csr.cc


22. include/neug/utils/property/column.h ✨ Enhancement +94/-1

ColumnBase Module interface integration

• Makes ColumnBase inherit from Module base class
• Implements Module interface: Open(ckp, desc), Dump(ckp), Close(), Fork(ckp, level)
• Adds abstract ForkAsColumn() method for typed implementations
• Adds opened_prefix_ and opened_dir_ members for tracking state

include/neug/utils/property/column.h


23. include/neug/utils/property/table.cc ✨ Enhancement +0/-0

Table Fork implementation and TypedColumn registrations

• Implements Fork() creating independent table copy with forked columns
• Registers 12 TypedColumn specializations with ModuleFactory
• Covers all types: empty, bool, int32/64, uint32/64, float, double, Date, DateTime, Interval,
 string

include/neug/utils/property/table.cc


24. include/neug/utils/id_indexer.h ✨ Enhancement +106/-1

LFIndexer Module interface integration

• Makes LFIndexer inherit from Module base class
• Implements Module interface: Open(ckp, desc), Dump(ckp), Fork(ckp, level)
• Adds name_, snapshot_dir_, and memory_level_ members
• Implements set_name() and ModuleTypeName() methods

include/neug/utils/id_indexer.h


25. src/main/neug_db.cc ✨ Enhancement +11/-1

NeugDB checkpoint-based graph opening

• Modifies openGraphAndSchema() to use Workspace for checkpoint discovery
• Opens graph from latest checkpoint if available, otherwise opens empty graph
• Logs checkpoint path when opening from checkpoint

src/main/neug_db.cc


26. tests/storage/test_vertex_table.cc 🧪 Tests +28/-44

VertexTable tests refactored for Checkpoint API

• Replaces manual checkpoint directory creation with Workspace::CreateCheckpoint()
• Updates tests to use ModuleDescriptor and Checkpoint-based Open(ckp, desc) API
• Removes manual temp/checkpoint directory renaming logic
• Simplifies test setup by using workspace abstraction

tests/storage/test_vertex_table.cc


27. tests/storage/test_edge_table.cc 🧪 Tests +16/-3

EdgeTable tests refactored for Checkpoint API

• Updates OpenEdgeTable() to use Workspace and Checkpoint-based API
• Replaces legacy Open(work_dir) calls with Open(ckp, desc)
• Changes Dump() call to DumpToDir() for legacy flat-directory dumping

tests/storage/test_edge_table.cc


28. tests/unittest/test_indexer.cc 🧪 Tests +34/-35

LFIndexer tests refactored for Checkpoint API

• Replaces manual checkpoint directory setup with Workspace::CreateCheckpoint()
• Updates indexer test to use Dump(ckp) and Open(ckp, desc) Module interface
• Removes legacy file_names.h include, adds workspace.h include
• Simplifies test directory structure

tests/unittest/test_indexer.cc


29. tests/storage/vertex_table_benchmark.cc 🧪 Tests +9/-5

VertexTable benchmark updated for Checkpoint API

• Updates benchmark to use Checkpoint and ModuleDescriptor for table opening
• Calls ckp.create_dirs() to initialize checkpoint structure

tests/storage/vertex_table_benchmark.cc


30. src/storages/module/CMakeLists.txt ⚙️ Configuration changes +2/-0

Module storage CMake build configuration

• Creates new CMake target neug_storages_module for module factory and base classes

src/storages/module/CMakeLists.txt


31. include/neug/storages/csr/immutable_csr.h ✨ Enhancement +29/-0

Add module fork and type name methods to CSR classes

• Added include for neug/storages/module/type_name.h to support type name generation
• Implemented ForkAsCsr() method in ImmutableCsr class that creates a UUID-based copy and opens
 it
• Implemented ModuleTypeName() method in ImmutableCsr returning type-specific identifier
• Implemented ForkAsCsr() and ModuleTypeName() methods in SingleImmutableCsr class with
 similar functionality

include/neug/storages/csr/immutable_csr.h


32. include/neug/storages/graph/schema.h ✨ Enhancement +32/-1

Make Schema inherit from Module with lifecycle methods

• Added include for neug/storages/module/module.h to inherit from Module base class
• Changed Schema class to inherit from Module base class
• Implemented module lifecycle methods: Open(), Dump(), Close(), and Fork()
• Added ModuleTypeName() override returning "schema" identifier

include/neug/storages/graph/schema.h


33. include/neug/storages/module/type_name.h ✨ Enhancement +60/-0

New type name mapping trait for storage element types

• Created new header file with StorageTypeName template trait for C++ type to string mapping
• Defined DEFINE_STORAGE_TYPE_NAME macro for specializing type names
• Provided specializations for common types: EmptyType, bool, integer types, floating-point
 types, Date, DateTime, Interval, and std::string_view

include/neug/storages/module/type_name.h


34. include/neug/utils/property/table.h ✨ Enhancement +6/-0

Add Fork method to Table class for column duplication

• Added Fork() method declaration that creates forked property columns to UUID paths under
 checkpoint runtime directory
• Method returns a new Table owning the forked columns

include/neug/utils/property/table.h


35. src/storages/CMakeLists.txt ⚙️ Configuration changes +2/-1

Add module subdirectory to CMake build configuration

• Added add_subdirectory(module) to include the new module subdirectory in build
• Updated NEUG_STORAGES_OBJFILES to include $<TARGET_OBJECTS:neug_storages_module> object files

src/storages/CMakeLists.txt


36. include/neug/storages/file_names.h Additional files +0/-4

...

include/neug/storages/file_names.h


37. src/utils/property/table.cc Additional files +52/-0

...

src/utils/property/table.cc


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Mar 31, 2026

Code Review by Qodo

🐞 Bugs (7) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Checkpoint restore drops data 🐞 Bug ≡ Correctness
Description
PropertyGraph::Open(const Checkpoint&) ignores SnapshotMeta descriptor.path and calls legacy
VertexTable/EdgeTable Open with work_dir=ckp.path(), so they look for data under
checkpoint_dir(ckp.path()) ("…/checkpoint-�NNNNN/checkpoint/") instead of the dumped UUID
directories, yielding empty tables on restart.
Code

src/storages/graph/property_graph.cc[R825-866]

+    vertex_tables_.emplace_back(schema_.get_vertex_schema(i));
+    auto v_label = schema_.get_vertex_label_name(static_cast<label_t>(i));
+    auto key = SnapshotMeta::vertex_table_key(v_label);
+    if (auto desc = meta.module(key)) {
+      vertex_tables_.back().Open(ckp.path(), desc->memory_level);
+    }
+    auto v_size = vertex_tables_[i].Size();
+    vertex_tables_[i].EnsureCapacity(v_size < 4096 ? 4096
+                                                   : v_size + v_size / 4);
+    vertex_capacities[i] = vertex_tables_[i].Capacity();
+  }
+
+  // ── Edge tables ───────────────────────────────────────────────────────────
+  for (size_t src_i = 0; src_i < vertex_label_total_count_; ++src_i) {
+    if (!schema_.vertex_label_valid(src_i))
+      continue;
+    auto src_label = schema_.get_vertex_label_name(static_cast<label_t>(src_i));
+    for (size_t dst_i = 0; dst_i < vertex_label_total_count_; ++dst_i) {
+      if (!schema_.vertex_label_valid(dst_i))
+        continue;
+      auto dst_label =
+          schema_.get_vertex_label_name(static_cast<label_t>(dst_i));
+      for (size_t e_i = 0; e_i < edge_label_total_count_; ++e_i) {
+        if (!schema_.edge_label_valid(e_i))
+          continue;
+        auto edge_label =
+            schema_.get_edge_label_name(static_cast<label_t>(e_i));
+        if (!schema_.exist(src_label, dst_label, edge_label))
+          continue;
+        size_t index = schema_.generate_edge_label(src_i, dst_i, e_i);
+        auto key =
+            SnapshotMeta::edge_table_key(src_label, edge_label, dst_label);
+        EdgeTable et(schema_.get_edge_schema(src_i, dst_i, e_i));
+        if (auto desc = meta.module(key)) {
+          et.Open(ckp.path(), desc->memory_level);
+        }
+        auto e_size = et.Size();
+        size_t e_cap = e_size < 4096 ? 4096 : e_size + (e_size + 4) / 5;
+        et.EnsureCapacity(vertex_capacities[src_i], vertex_capacities[dst_i],
+                          e_cap);
+        edge_tables_.emplace(index, std::move(et));
+      }
Evidence
PropertyGraph checkpoint-open loads Schema from SnapshotMeta, but for vertex/edge tables it calls
the legacy Open(work_dir, level) with work_dir=ckp.path(), and does not use the descriptor.path that
Dump stored. Legacy Open derives snapshot paths via checkpoint_dir(work_dir), which (per
file_names.h) appends "/checkpoint/" to work_dir, so it will not find data dumped under
ckp.runtime_dir()/UUID. NeugDB now prefers LatestCheckpoint() at startup, amplifying this into
user-visible silent data loss after a restart.

src/storages/graph/property_graph.cc[795-866]
src/storages/graph/property_graph.cc[1078-1156]
src/storages/graph/vertex_table.cc[83-110]
src/main/neug_db.cc[228-244]
include/neug/storages/file_names.h[157-184]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`PropertyGraph::Open(const Checkpoint&)` currently restores vertex/edge tables by calling the legacy `Open(work_dir, memory_level)` with `work_dir=ckp.path()`, ignoring the `ModuleDescriptor` paths saved in `SnapshotMeta`. This makes the restore path search in the wrong directory (`checkpoint_dir(ckp.path())`) and results in empty vertex/edge data after restart.

### Issue Context
- `Dump()` persists module descriptors into `SnapshotMeta`, and some Dumps write into UUID dirs under `ckp.runtime_dir()`.
- Legacy `Open(work_dir, level)` expects a *workspace-style* root and derives snapshot locations using `checkpoint_dir(work_dir)`.

### Fix Focus Areas
- src/storages/graph/property_graph.cc[795-866]
- include/neug/storages/graph/vertex_table.h[124-136]
- include/neug/storages/graph/edge_table.h[54-66]
- src/storages/graph/vertex_table.cc[26-61]
- src/storages/graph/edge_table.cc[557-608]

### What to change
1. Implement `VertexTable::Open(const Checkpoint&, const ModuleDescriptor&)` and `EdgeTable::Open(const Checkpoint&, const ModuleDescriptor&)` to open from `descriptor.path` (the snapshot directory) + checkpoint runtime context.
2. In `PropertyGraph::Open(const Checkpoint&)`, call those checkpoint-based `Open(ckp, *desc)` methods instead of the legacy `Open(ckp.path(), desc->memory_level)`.
3. Ensure `Dump()` and `Open()` agree on whether snapshot data lives under `ckp.snapshot_dir()` or UUID dirs under `ckp.runtime_dir()`; pick one convention and apply consistently.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. tmp_dir runtime nesting 🐞 Bug ☼ Reliability
Description
Code passes Checkpoint::runtime_dir() into file_names.h tmp_dir(), but tmp_dir() itself appends
"/runtime/tmp/"; this creates paths like ".../runtime/runtime/tmp" and breaks tmp cleanup/creation
for checkpoint opens and dumps.
Code

src/storages/graph/property_graph.cc[R811-816]

+  // Clean up any leftover tmp dir inside the checkpoint's runtime dir
+  auto tmp = tmp_dir(ckp.runtime_dir());
+  if (std::filesystem::exists(tmp)) {
+    remove_directory(tmp);
+  }
+  std::filesystem::create_directories(tmp_dir(ckp.runtime_dir()));
Evidence
tmp_dir(work_dir) is defined as runtime_dir(work_dir) + "tmp/", and runtime_dir(work_dir)
appends "/runtime/". Checkpoint::runtime_dir() already returns path_ + "/runtime" (a runtime
directory), so tmp_dir(ckp.runtime_dir()) becomes ckp.path()/runtime/runtime/tmp/. This same
misuse also exists in LFIndexer::Open, causing it to create and use nested runtime directories.

src/storages/graph/property_graph.cc[811-817]
src/storages/graph/property_graph.cc[1149-1152]
include/neug/storages/file_names.h[169-184]
include/neug/storages/workspace.h[68-75]
include/neug/utils/id_indexer.h[250-263]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`tmp_dir()`/`runtime_dir()` helpers in `file_names.h` expect a *work_dir root*, but several new checkpoint-based paths pass `ckp.runtime_dir()` into them, producing `.../runtime/runtime/tmp` and causing tmp directory management to break.

### Issue Context
- `Checkpoint::runtime_dir()` returns `.../checkpoint-NNNNN/runtime`.
- `tmp_dir(work_dir)` returns `work_dir + "/runtime/tmp/"`.

### Fix Focus Areas
- src/storages/graph/property_graph.cc[811-817]
- src/storages/graph/property_graph.cc[1149-1152]
- include/neug/utils/id_indexer.h[250-256]
- include/neug/storages/file_names.h[169-184]
- include/neug/storages/workspace.h[68-75]

### What to change
1. Replace `tmp_dir(ckp.runtime_dir())` calls with either:
  - `ckp.runtime_dir() + "/tmp"` (direct composition), **or**
  - `tmp_dir(ckp.path())` if you want to keep using `file_names.h` conventions.
2. In LFIndexer::Open and other modules, pass `ckp.path()` as the `work_dir` argument to legacy helpers (`tmp_dir`, `runtime_dir`) rather than `ckp.runtime_dir()`.
3. Update cleanup in `PropertyGraph::Dump` to remove the correct tmp directory (the one actually used during Open/Dump).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Checkpoint Open always throws 🐞 Bug ≡ Correctness
Description
VertexTable::Open(Checkpoint, ModuleDescriptor) and EdgeTable::Open(Checkpoint, ModuleDescriptor)
unconditionally throw NOT_IMPLEMENTED, but tests and benchmarks now call these overloads, causing
immediate runtime failures.
Code

include/neug/storages/graph/vertex_table.h[R124-130]

+  // Module interface (Checkpoint-based) — the single public Open.
+  void Open(const Checkpoint& ckp,
+            const ModuleDescriptor& descriptor) override {
+    THROW_NOT_IMPLEMENTED_EXCEPTION(
+        "Checkpoint-based Open is not supported for VertexTable. Use "
+        "Open(work_dir, memory_level) instead.");
+  }
Evidence
Both VertexTable and EdgeTable explicitly throw in their checkpoint-based Open overrides. The
updated tests/benchmarks call these overloads (new_table.Open(ckp, desc) / `edge_table->Open(ckp,
desc)`), so they will fail at runtime.

include/neug/storages/graph/vertex_table.h[124-136]
include/neug/storages/graph/edge_table.h[54-66]
tests/storage/test_vertex_table.cc[155-190]
tests/storage/test_edge_table.cc[138-156]
tests/storage/vertex_table_benchmark.cc[68-75]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`VertexTable::Open(const Checkpoint&, const ModuleDescriptor&)` and `EdgeTable::Open(const Checkpoint&, const ModuleDescriptor&)` currently throw, but new tests/benchmarks call them. This breaks the new checkpoint-based API and CI.

### Issue Context
The PR introduces `Module`/`Checkpoint` as the persistence interface; tests were updated accordingly.

### Fix Focus Areas
- include/neug/storages/graph/vertex_table.h[124-136]
- include/neug/storages/graph/edge_table.h[54-66]
- tests/storage/test_vertex_table.cc[155-190]
- tests/storage/test_edge_table.cc[138-156]
- tests/storage/vertex_table_benchmark.cc[68-75]

### What to change
Option A (recommended):
1. Implement checkpoint-based `Open(ckp, desc)` for both VertexTable and EdgeTable (reading from `desc.path` and using `ckp.runtime_dir()` for mutable working files).
2. Keep legacy `Open(work_dir, level)` as an internal/legacy helper.

Option B (if checkpoint Open is intentionally unsupported):
1. Revert/update tests and benchmark to use legacy `Open(work_dir, level)`.
2. Update PropertyGraph header/docs to avoid claiming checkpoint-based Open uses Module::Open for these modules.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. Column descriptor contract broken 🐞 Bug ≡ Correctness
Description
ColumnBase::Dump writes to ckp.runtime_dir()/uuid and sets ModuleDescriptor.path to that file
path, but ColumnBase::Open expects desc.path to be a directory and requires extra["name"]; Dump
never sets the name, so a dumped column cannot be reopened via Module::Open.
Code

include/neug/utils/property/column.h[R96-115]

+  void Open(const Checkpoint& ckp, const ModuleDescriptor& desc) override {
+    opened_prefix_ = desc.get("name").value_or("");
+    opened_dir_ = desc.path;
+    open(opened_prefix_, desc.path, ckp.runtime_dir());
+  }
+
+  /**
+   * @brief Dump the column to a UUID-named path under ckp.runtime_dir().
+   */
+  ModuleDescriptor Dump(const Checkpoint& ckp) override {
+    std::string uuid = generate_uuid();
+    std::string fork_path = ckp.runtime_dir() + "/" + uuid;
+    dump(fork_path);
+    ModuleDescriptor d;
+    d.path = fork_path;
+    d.type = "column";
+    d.module_type = ModuleTypeName();
+    d.version = 1;
+    d.memory_level = MemoryLevel::kSyncToFile;
+    return d;
Evidence
TypedColumn::open uses snapshot_dir + "/" + name as the base path, so desc.path must be a
directory and name must be provided. The new ColumnBase::Open reads desc.get("name"), but
ColumnBase::Dump doesn't set it and dumps to a UUID path that is not combined with any name prefix.

include/neug/utils/property/column.h[52-157]
include/neug/utils/property/column.h[91-116]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ColumnBase::Open` expects `ModuleDescriptor` to provide:
- `desc.path` = directory containing the snapshot files
- `extra["name"]` = file prefix

But `ColumnBase::Dump` currently:
- calls `dump(fork_path)` where `fork_path` is a UUID path
- sets `d.path = fork_path`
- never sets `d.set("name", ...)`

So the descriptor produced by Dump cannot be consumed by Open.

### Fix Focus Areas
- include/neug/utils/property/column.h[91-116]
- include/neug/utils/property/column.h[52-57]

### What to change
1. Make `Dump()` create a UUID *directory* under `ckp.runtime_dir()`.
2. Dump files as `data_dir + "/" + <column_prefix>` and set:
  - `d.path = data_dir`
  - `d.set("name", <column_prefix>)`
3. Decide what `<column_prefix>` is:
  - reuse `opened_prefix_` if the column was opened before, or
  - generate a stable per-column name and persist it.
4. Update `Open()` to pass a `work_dir` consistent with the rest of the codebase (avoid passing `ckp.runtime_dir()` into legacy helpers that expect a root).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

5. Descriptors miss module_type 🐞 Bug ⚙ Maintainability
Description
Schema/VertexTable/EdgeTable/LFIndexer Dump implementations omit ModuleDescriptor::module_type
even though the Module contract says Dump should populate it for ModuleFactory reconstruction,
making factory-based restore impossible for those modules.
Code

src/storages/graph/vertex_table.cc[R90-96]

+  ModuleDescriptor desc;
+  desc.path = target_dir;
+  desc.size = static_cast<uint64_t>(LidNum());
+  desc.type = "vertex_table";
+  desc.version = 1;
+  desc.memory_level = memory_level_;
+  return desc;
Evidence
Module::ModuleTypeName documentation states the value is written into
ModuleDescriptor::module_type by Dump implementations. However, several Dumps only set desc.type
and never set desc.module_type, while ModuleFactory is designed to create modules by
descriptor.module_type.

include/neug/storages/module/module.h[92-103]
src/storages/graph/schema.cc[2284-2309]
src/storages/graph/vertex_table.cc[83-97]
src/storages/graph/edge_table.cc[610-635]
include/neug/utils/id_indexer.h[273-287]
include/neug/storages/module/module_factory.h[49-53]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Several `Dump()` implementations do not set `ModuleDescriptor::module_type`, but ModuleFactory expects to create modules using `descriptor.module_type`.

### Issue Context
The `Module` interface explicitly documents that `ModuleTypeName()` is written into `ModuleDescriptor::module_type`.

### Fix Focus Areas
- src/storages/graph/schema.cc[2301-2308]
- src/storages/graph/vertex_table.cc[90-96]
- src/storages/graph/edge_table.cc[628-634]
- include/neug/utils/id_indexer.h[280-286]
- include/neug/storages/module/module.h[92-103]

### What to change
1. In each Dump, add `desc.module_type = ModuleTypeName();`.
2. Keep `desc.type` as the coarse tag if desired, but ensure `module_type` is the factory key.
3. Add an assertion/log in ModuleFactory restore paths (wherever descriptors are consumed) if `module_type` is empty to catch future regressions.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. CSR fork ignores MemoryLevel 🐞 Bug ➹ Performance
Description
CSR ForkAsCsr implementations ignore the requested MemoryLevel and always dump+open via the
sync-to-file open() path, so callers cannot obtain in-memory or hugepage forks as requested.
Code

include/neug/storages/csr/mutable_csr.h[R192-199]

+  std::unique_ptr<CsrBase> ForkAsCsr(const Checkpoint& ckp,
+                                     MemoryLevel level) const override {
+    std::string uuid = generate_uuid();
+    const_cast<MutableCsr<EDATA_T>*>(this)->dump(uuid, ckp.runtime_dir());
+    auto new_csr = std::make_unique<MutableCsr<EDATA_T>>();
+    new_csr->open(uuid, ckp.runtime_dir(), ckp.runtime_dir());
+    return new_csr;
+  }
Evidence
ForkAsCsr receives MemoryLevel level but does not branch on it; it always dumps to runtime_dir and
calls open(...), which is the sync-to-file path in CSR implementations. This defeats the purpose
of MemoryLevel-controlled forks (used by EdgeTable::Fork and Module::Fork).

include/neug/storages/csr/mutable_csr.h[192-203]
include/neug/storages/csr/immutable_csr.h[118-125]
src/storages/csr/mutable_csr.cc[39-78]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`ForkAsCsr(const Checkpoint&, MemoryLevel)` ignores `level` and always reopens using `open(...)` (sync-to-file style). This prevents callers from requesting in-memory/hugepage forks.

### Fix Focus Areas
- include/neug/storages/csr/mutable_csr.h[192-203]
- include/neug/storages/csr/immutable_csr.h[118-125]
- include/neug/storages/csr/csr_base.h[141-147]

### What to change
1. After dumping to a UUID prefix under `ckp.runtime_dir()`, reopen based on `level`:
  - `kSyncToFile`: `open(uuid, <snapshot_dir>, ckp.path())` (or consistent work_dir)
  - `kInMemory`: `open_in_memory(<dir> + "/" + uuid)`
  - `kHugePagePrefered`: `open_with_hugepages(<dir> + "/" + uuid)`
2. Make sure the chosen `<dir>` and `work_dir` parameters match the tmp_dir/runtime_dir conventions in `file_names.h` (avoid passing `ckp.runtime_dir()` as a work_dir root).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

7. UUID generation unchecked 🐞 Bug ☼ Reliability
Description
generate_uuid() does not check whether opening or reading "/proc/sys/kernel/random/uuid" succeeded;
it may return an empty string which then becomes a directory/file name, causing collisions or writes
into unexpected paths.
Code

include/neug/storages/workspace.h[R34-39]

+inline std::string generate_uuid() {
+  std::ifstream f("/proc/sys/kernel/random/uuid");
+  std::string uuid;
+  std::getline(f, uuid);
+  return uuid;
+}
Evidence
The function assumes the /proc file exists and is readable. If the ifstream fails or getline fails,
uuid remains empty and is returned unchecked; many Dump/Fork paths use this as a directory
component.

include/neug/storages/workspace.h[34-39]
src/storages/graph/property_graph.cc[1078-1147]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`generate_uuid()` can return an empty string if `/proc/sys/kernel/random/uuid` cannot be read, but callers use it to form directory/file paths.

### Fix Focus Areas
- include/neug/storages/workspace.h[34-39]

### What to change
1. Check `f.is_open()` and the result of `std::getline`.
2. If read fails, either:
  - throw a clear exception, or
  - fallback to a portable UUID generator (e.g., std::random_device + formatting).
3. Consider trimming whitespace/newlines from the UUID string before returning.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment thread src/storages/graph/property_graph.cc Outdated
Comment thread src/storages/graph/property_graph.cc Outdated
Comment thread include/neug/storages/graph/vertex_table.h Outdated
Comment thread include/neug/utils/property/column.h Outdated
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 5 times, most recently from 516e96f to 5bc7c6d Compare April 2, 2026 06:44
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 5 times, most recently from b0c0529 to 382166b Compare April 14, 2026 03:30
Comment thread include/neug/storages/csr/mutable_csr.h Outdated
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 2 times, most recently from ac894ec to 347749a Compare April 15, 2026 02:22
@zhanglei1949 zhanglei1949 marked this pull request as ready for review April 15, 2026 07:16
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 2 times, most recently from 46823b4 to 96eb6ed Compare April 16, 2026 02:36
@zhanglei1949 zhanglei1949 requested review from Copilot April 16, 2026 02:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors persistent storage management to revolve around Workspace + Checkpoint + ModuleDescriptor (module-based Open/Dump/Fork), and updates graph/table/CSR codepaths and tests accordingly (Fix #133).

Changes:

  • Introduce checkpointed workspace layout (Workspace, Checkpoint, SnapshotMeta, ModuleDescriptor) and module registry (ModuleFactory).
  • Migrate PropertyGraph, VertexTable, EdgeTable, Table/Column, and CSR implementations to the new module-driven Open/Dump/Fork API.
  • Update/extend unit & storage tests for the new checkpoint/module workflow and add new fork/dump matrix tests.

Reviewed changes

Copilot reviewed 67 out of 67 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/utils/test_table.cc Updates table tests to use Workspace/Checkpoint and descriptors instead of hardcoded dirs.
tests/unittest/utils.h Adds standard library includes needed by updated unit test helpers.
tests/unittest/test_column.cc New typed tests for column fork isolation + dump/open matrix.
tests/unittest/logical_delete_test.cc Migrates PropertyGraph open to checkpoint-based API.
tests/unittest/CMakeLists.txt Adds new test_column target.
tests/storage/vertex_table_benchmark.cc Updates benchmark to use checkpoint/module descriptors.
tests/storage/test_vertex_table.cc Migrates vertex table tests to checkpoint-based Open/Dump.
tests/storage/test_set_property.cc Includes updated unittest utils header.
tests/storage/test_property_graph.cc Migrates property graph test fixture to checkpoint-based Open.
tests/storage/test_open_graph.cc Includes updated unittest utils header and tweaks logging.
tests/storage/test_mutable_csr.cc Adds fork/dump tests and migrates CSR tests to checkpoint-based API.
tests/storage/test_immutable_csr.cc Migrates immutable CSR tests to checkpoint-based API and adds workspace usage.
tests/storage/test_export.cc Includes updated unittest utils header.
tests/storage/test_ddl.cc Includes updated unittest utils header.
tests/storage/test_csr_stream_ops.cc Migrates CSR stream tests to checkpoint-based API and workspace-managed dirs.
tests/storage/test_csr_batch_ops.cc Migrates CSR batch tests to checkpoint-based Open/Dump.
tests/storage/alter_property_test.cc Migrates alter-property test to open graph via checkpoint.
tests/execution/test_value.cc Minor formatting cleanup for EXPECT_DEATH.
src/utils/uuid.cc Adds UUID generator implementation.
src/utils/property/table.cc Refactors Table to module-based Open/Dump/Fork and registers column module types.
src/utils/id_indexer.cc Registers LF indexer module type.
src/storages/module/module_factory.cc Adds module factory implementation and registry.
src/storages/module/CMakeLists.txt Adds object library for module sources.
src/storages/loader/abstract_property_graph_loader.cc Loader now creates a workspace+checkpoint and uses checkpoint meta schema.
src/storages/graph/workspace.cc New workspace implementation that discovers/creates checkpoints.
src/storages/graph/vertex_timestamp.cc Migrates vertex timestamp to module-based Open/Dump/Fork and registers module.
src/storages/graph/vertex_table.cc Migrates vertex table to module-based Open/Dump and descriptor submodules.
src/storages/graph/snapshot_meta.cc Adds JSON-persisted snapshot meta (schema + module inventory).
src/storages/graph/schema.cc Adds Schema JSON (de)serialization via YAML<->JSON conversion and fixes edge-strategy labels.
src/storages/graph/property_graph.cc Migrates graph Open/Dump to checkpoint/meta-driven module open.
src/storages/graph/module_descriptor.cc Implements ModuleDescriptor JSON and submodule tree handling.
src/storages/graph/edge_table.cc Migrates edge table to module-based Open/Dump and property mutations using checkpoint.
src/storages/graph/checkpoint.cc New checkpoint implementation (dirs, meta persistence, commit semantics, cleanup).
src/storages/csr/mutable_csr.cc Migrates mutable CSR variants to module API, implements descriptor-based dump/open, registers module types.
src/storages/csr/immutable_csr.cc Migrates immutable CSR variants to module API, implements dump/open, registers module types.
src/storages/container/mmap_container.cc Improves mmap resize error and adds Fork for containers.
src/storages/container/container_utils.cc Improves error logging when tmp path is missing in disk-backed mode.
src/storages/CMakeLists.txt Adds module/ subdir into storages build.
src/server/neug_db_service.cc Updates session pool init after WAL path handling changes.
src/main/neug_db.cc Migrates DB open/ingest/checkpoint flow to workspace+checkpoint model.
include/neug/utils/uuid.h Declares UUIDGenerator.
include/neug/utils/property/table.h Updates Table API to checkpoint/module descriptor and adds Fork.
include/neug/utils/property/column.h Migrates columns to Module interface and implements Open/Dump/Fork/type names.
include/neug/utils/id_indexer.h Migrates LFIndexer to Module interface and adds Open/Dump/Fork.
include/neug/storages/workspace.h Declares Workspace API.
include/neug/storages/snapshot_meta.h Declares SnapshotMeta (schema + modules) and key helpers.
include/neug/storages/module_descriptor.h Declares ModuleDescriptor with JSON + submodules.
include/neug/storages/module/type_name.h Adds StorageTypeName<> trait for module type strings.
include/neug/storages/module/module_factory.h Declares ModuleFactory + registration macro.
include/neug/storages/module/module.h Declares the module lifecycle interface.
include/neug/storages/loader/abstract_property_graph_loader.h Loader now owns a Workspace and seeds checkpoint meta schema.
include/neug/storages/graph/vertex_timestamp.h VertexTimestamp now implements Module.
include/neug/storages/graph/vertex_table.h VertexTable migrated to descriptor-based Open/Dump; indexer is now a pointer.
include/neug/storages/graph/schema.h Adds Schema JSON API declarations.
include/neug/storages/graph/property_graph.h Switches graph open/dump API to checkpoint-based, stores Checkpoint*.
include/neug/storages/graph/edge_table.h Migrates EdgeTable API to checkpoint-based Open/Dump/Close and property mutations.
include/neug/storages/file_names.h Removes temp_checkpoint_dir helper (no longer used with checkpoint model).
include/neug/storages/csr/mutable_csr.h Migrates CSR base implementations to module API and adds Fork/type names.
include/neug/storages/csr/immutable_csr.h Migrates immutable CSR to module API and defines module type names.
include/neug/storages/csr/csr_base.h Makes CSR base inherit Module and adds fork_vertex hook.
include/neug/storages/container/mmap_container.h Declares container Fork.
include/neug/storages/container/i_container.h Adds container Fork virtual method.
include/neug/storages/checkpoint.h Declares Checkpoint class and helpers (commit/meta/containers).
include/neug/server/session_pool.h SessionPool now derives WAL location from graph.checkpoint().
include/neug/main/neug_db.h NeugDB::work_dir() now returns workspace db dir; adds Workspace member.
Comments suppressed due to low confidence (12)

src/storages/graph/property_graph.cc:1

  • SnapshotMeta::edge_table_key is declared as (src_label, edge_label, dst_label) in include/neug/storages/snapshot_meta.h, but here it’s called as (src_label, dst_label, edge_label). This will break meta lookups (and likely won’t compile if overloads don’t match). Align the argument order either in the helper signature or at all call sites.
    src/storages/graph/vertex_table.cc:1
  • Table was refactored to Open/Dump/Fork and no longer shows a close() API in the updated header diff, so table_->close() is very likely a compile error. Either reintroduce a Table::Close() method (closing all columns’ modules) and call that, or update all callers to the new lifecycle method name consistently.
    src/storages/graph/edge_table.cc:1
  • Same issue as VertexTable: table_->close() is inconsistent with the refactored Table API and is likely missing now. Consider adding and using Table::Close() consistently from both VertexTable and EdgeTable.
    src/utils/property/table.cc:1
  • add_columns ignores the memory_level argument and always opens new columns as kInMemory. This changes behavior for disk-backed or hugepage configurations. Use the provided memory_level (and a stable per-column sub-descriptor/path strategy) so added columns follow the table’s storage policy.
    src/utils/property/table.cc:1
  • add_columns ignores the memory_level argument and always opens new columns as kInMemory. This changes behavior for disk-backed or hugepage configurations. Use the provided memory_level (and a stable per-column sub-descriptor/path strategy) so added columns follow the table’s storage policy.
    src/storages/graph/vertex_timestamp.cc:1
  • When desc is default-constructed, desc.path is empty; std::filesystem::exists(\"\") can resolve to the current directory on some platforms/libstdc++ versions, causing an unintended load_ts call with an empty path. Guard explicitly with !desc.path.empty() before checking exists().
    tests/utils/test_table.cc:1
  • This uses ::getpid() but the file diff doesn’t show an include for <unistd.h> (or an equivalent platform header). This can fail to compile on some toolchains. Add the appropriate include or replace getpid() usage with a portable alternative.
    tests/storage/vertex_table_benchmark.cc:1
  • VertexTable::Open expects the ModuleDescriptor to contain sub-modules (indexer, property_table, vertex_timestamp). Providing only module_type+path will result in empty sub-descriptors and can cause runtime failures (e.g., LFIndexer needing key type). For a benchmark, either (1) create a new empty table by passing a default descriptor but ensuring the indexer is initialized with pk type and opened accordingly, or (2) load a real descriptor produced by VertexTable::Dump.
    src/storages/graph/checkpoint.cc:1
  • This constructs a new UUIDGenerator (and thus a new std::random_device/PRNG) on every loop iteration. Consider reusing a static/shared generator (e.g., generate_uuid() already uses a static generator) to reduce overhead, especially since this is called frequently for runtime object creation.
    src/utils/uuid.cc:1
  • <regex> is included but not used in this translation unit. Removing unused includes reduces compile time and avoids confusion about intended validation logic.
    src/storages/graph/checkpoint.cc:1
  • Checkpoint meta updates now drive deletion of orphaned snapshot files. This is correctness-critical and potentially destructive; please add focused tests that (1) write meta referencing a subset of UUID files, (2) call UpdateMeta, and (3) verify only unreferenced UUID files are deleted while referenced ones remain.
    src/storages/graph/checkpoint.cc:1
  • Checkpoint meta updates now drive deletion of orphaned snapshot files. This is correctness-critical and potentially destructive; please add focused tests that (1) write meta referencing a subset of UUID files, (2) call UpdateMeta, and (3) verify only unreferenced UUID files are deleted while referenced ones remain.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/neug_db.cc Outdated
Comment thread include/neug/storages/graph/vertex_table.h Outdated
Comment thread include/neug/storages/checkpoint.h Outdated
@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 2 times, most recently from de32e4d to 31c325f Compare June 2, 2026 07:56
@zhanglei1949 zhanglei1949 requested a review from Copilot June 2, 2026 08:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@zhanglei1949 zhanglei1949 force-pushed the zl/workspace-ref branch 3 times, most recently from 5d82878 to e5202af Compare June 3, 2026 06:15
zhanglei1949 and others added 10 commits June 3, 2026 17:03
refine module_store and module_desc

refactor snapshot_meta/checkpoint, remove unncessary changes

remove module store

minor ref

ref wal apply

refine

refine checkpoint manage

fix ci

minor

refine

fix

minor refine

refine module register macro

fix format

rm file_names.h

minor refine

minor refine

supporing fast dump

remove some cmments

TBD: add dirty flag for MutableCSR, to avoid redumping the data

minor fix

remove method generate_uuid

fix test

revert is_dirty_ flag and compute md5 for mutable_csr's nbr_list before dump

fix test

check whether data_buffer is modified at column level

refine

remove unused field last_descriptor

cherry-pick c964e58

fix issues reported by aone copilot

two fold check

remove some comment

revert changes to indexer_ in VertexTable

minor

Committed-by: xiaolei.zl from Dev container

Committed-by: xiaolei.zl from Dev container

refine test cases

fix

iterate edge schemas from schema

Remove dummy function

remove fork related code, separate module open from v/etable construction

fix

format

fix test
fix

fix pytest

remove dead code

rename func

make indexer's indices column, not raw container

format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants