Skip to content

fix: DROP TABLE no longer leaks across checkpoint boundary (#293, #398)#399

Merged
zhanglei1949 merged 8 commits into
alibaba:mainfrom
zhanglei1949:zl/fix-293-v2
May 25, 2026
Merged

fix: DROP TABLE no longer leaks across checkpoint boundary (#293, #398)#399
zhanglei1949 merged 8 commits into
alibaba:mainfrom
zhanglei1949:zl/fix-293-v2

Conversation

@zhanglei1949
Copy link
Copy Markdown
Member

@zhanglei1949 zhanglei1949 commented May 22, 2026

Summary

Fixes #293 and implements the refactor tracked in #398. Replaces the earlier PR #306 approach with a tighter Drop/checkpoint contract that keeps file-system mutation out of the container layer.

  • Drop() at every layer releases in-memory state only; checkpoint_dir is touched exclusively by Dump(). This preserves the implicit crash-rollback semantics for DROP-before-CHECKPOINT.
  • Open() split into Open() (restore from checkpoint) and Initialize() (fresh table). Replaces the leaky bool load_from_checkpoint = true parameter.
  • Initialize() sweeps tmp_dir for files keyed on the same label name/triple before opening fresh containers. This is what unsticks the same-session DROP+CREATE stale-data path — schema reuses the numeric label_t for a same-name CREATE, but tmp filenames key on the string name.
  • DeleteVertexType and DeleteEdgeType call Drop() before erasing, making the two paths symmetric.
  • IDataContainer::Drop() is intentionally NOT added. In kInMemory / kHugePagePreferred modes a container's path_ points at the checkpoint file, so a generic Drop-deletes-path contract would silently corrupt checkpoint state. See Refactor: tighten DROP TABLE / checkpoint isolation contract #398 for the full rationale.

Issue alibaba#293 reported two bugs from DROP TABLE:
1. DROP TABLE x; CREATE TABLE x; SELECT returned stale data.
2. DROP TABLE x; CHECKPOINT crashed the server.

Root cause: VertexTable/EdgeTable lifecycle was not separated from the
on-disk checkpoint state. Re-CREATEing a same-name label re-mmapped the
prior label's leftover tmp files, and Drop() left the in-memory smart
pointers populated, breaking the next Dump().

Design constraints adopted:

- Drop() at every layer releases in-memory state only; it never touches
  the filesystem. checkpoint_dir is owned exclusively by Dump(). This
  preserves the implicit crash-rollback semantics: a crash between DROP
  and the next CHECKPOINT re-reads the prior checkpoint and effectively
  undoes the DROP.
- Open() is split into Open() (restore from checkpoint) and Initialize()
  (fresh table). Initialize() sweeps tmp_dir for files keyed on the same
  label name/triple before opening, which is what fixes the same-session
  DROP+CREATE stale-data path (schema reuses the numeric label_t for a
  same-name CREATE, but tmp filenames key on the string name).
- Tombstone slots in vertex_tables_ keep the label_t -> vector index
  mapping stable across DROP.
- DeleteVertexType / DeleteEdgeType call Drop() before erasing the edge
  table map entries, so the two paths are symmetric.

The previously proposed virtual IDataContainer::Drop() and per-container
Drop() overrides are not introduced: in kInMemory / kHugePagePreferred
modes a container's path_ points directly at the checkpoint file, so a
generic Drop()-deletes-path_ contract would silently wipe checkpoint
state. Keeping the file-system exit out of the container interface
removes the footgun entirely.

Tests:
- tests/storage/test_checkpoint.cc: DropTableCheckpointTest cases for
  drop+checkpoint, drop+reopen+recreate, and the edge-table mirrors.
- tools/python_bind/tests/test_db_query.py: end-to-end coverage for
  DROP TABLE + recreate visibility.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens the DROP TABLE vs CHECKPOINT contract to prevent dropped tables from leaking persisted state across a checkpoint boundary (fixing #293) and implements the refactor tracked in #398 by separating “restore from checkpoint” vs “fresh initialize” behavior for storage tables.

Changes:

  • Split table bring-up into Open() (restore from checkpoint) vs Initialize() (fresh table) for vertex/edge tables, with tmp-dir sweeping on re-create to avoid same-session stale mmaps.
  • Adjust graph open semantics to tolerate tombstoned (dropped) label slots while preserving label_t → slot mapping.
  • Add/expand regression coverage in both Python bindings and C++ storage checkpoint tests, including typed coverage over multiple memory levels.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/python_bind/tests/test_db_query.py Adds Python regression tests for DROP+reCREATE and delete-all-then-reinsert visibility.
tests/storage/test_checkpoint.cc Refactors checkpoint tests into typed suites (memory-level coverage) and adds DROP/CHECKPOINT regression scenarios.
src/storages/graph/vertex_table.cc Adds Initialize() + tmp-file sweeping and routes Open()/Initialize() through a shared openImpl.
src/storages/graph/property_graph.cc Uses Initialize() for CREATE paths, calls Drop() before erasing edge tables, and keeps tombstone slots on open.
src/storages/graph/edge_table.cc Adds Initialize() + tmp sweeping, refactors open path, and introduces Close()/Drop().
src/storages/csr/mutable_csr.cc Makes CSR open tolerate missing snapshot directories by conditionally building the snapshot prefix.
include/neug/storages/graph/vertex_table.h Adds default ctor + Initialize() + shared openImpl declaration.
include/neug/storages/graph/edge_table.h Adds Initialize(), Close(), Drop(), and shared openImpl declaration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/storages/graph/vertex_table.cc
Comment thread src/storages/graph/vertex_table.cc
Comment thread src/storages/graph/edge_table.cc
Comment thread src/storages/csr/mutable_csr.cc
Committed-by: xiaolei.zl from Dev container
Committed-by: xiaolei.zl from Dev container
Comment thread src/storages/graph/edge_table.cc Outdated
meta_->dst_label_name, meta_->edge_label_name,
capacity_, table_idx_);
if (table_cap != capacity_.load()) {
THROW_INVALID_ARGUMENT_EXCEPTION(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is INVALID_ARGUMENT_EXCEPTION appropriate here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with Internal error

Comment thread src/storages/graph/edge_table.cc Outdated
// DROP). Do not make this function delete files; doing so requires writing
// DROP to the WAL first or that rollback invariant will break.
void EdgeTable::Drop() {
out_csr_.reset();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a difference in usage between Drop and Close?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that for LFIndexer, Table, VertexTable, EdgeTable, the logic of drop and close are actually the same. Maybe we could unify two method into one.

liulx20
liulx20 previously approved these changes May 25, 2026
@zhanglei1949 zhanglei1949 merged commit 915605e into alibaba:main May 25, 2026
12 checks passed
@lnfjpt lnfjpt mentioned this pull request May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] DROP TABLE does not clear checkpointed data; stale nodes visible after CREATE TABLE IF NOT EXISTS

3 participants