Skip to content

Add stateless tests for MergeTree lightweight deletes#101792

Merged
fm4v merged 9 commits intomasterfrom
nik/lwd-tests
Apr 6, 2026
Merged

Add stateless tests for MergeTree lightweight deletes#101792
fm4v merged 9 commits intomasterfrom
nik/lwd-tests

Conversation

@fm4v
Copy link
Copy Markdown
Member

@fm4v fm4v commented Apr 4, 2026

Nine stateless tests covering LWD behaviors that had zero test coverage. All tests run in parallel (no no-parallel tag).

What is tested

File Behavior covered Why it was missing
04077_lwd_optimizer_flag has_lightweight_delete flag in system.parts: S0→S1→S2 transitions No flag lifecycle test existed
04077_lwd_trivial_count COUNT(*) correct with optimize_trivial_count_query=1 when LWD parts exist Optimization could return stale part-level count
04077_lwd_materialized_ephemeral MATERIALIZED and DEFAULT columns correct after LWD No LWD + computed column test
04077_lwd_replicated_flag has_lightweight_delete flag cleared after merge on ReplicatedMergeTree No RMT-specific flag recovery test
04077_lwd_read_in_order read_in_order optimization skips deleted rows correctly No read_in_order + LWD test
04077_lwd_multiple_cycles Multiple delete→merge→delete cycles; flag and row correctness at each stage Existing tests only do one cycle
04077_lwd_row_exists_hidden _row_exists absent from system.columns; not exposed via SELECT * 02454 tests schema enforcement, not post-LWD visibility
04077_lwd_qualified_column DELETE WHERE with equality, range, string, compound, and float predicates No predicate-variety test
04077_lwd_alter_delete_privilege ALTER DELETE privilege required; ACCESS_DENIED without it No RBAC test for LWD

Already covered by existing tests (not duplicated): sync settings (03033), projection modes (03161), compact parts (02319), IN PARTITION incl. replicated (02352), ON CLUSTER (02541), projection correctness + tuple subcolumn (03254), PREWHERE + LWD (02461).

Changelog category (leave one):

  • Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added stateless tests for MergeTree lightweight deletes covering: has_lightweight_delete flag lifecycle, COUNT(*) correctness with optimize_trivial_count_query, MATERIALIZED/DEFAULT column integrity, ReplicatedMergeTree flag recovery, read_in_order with deleted rows, multiple delete/merge cycles, _row_exists column hiding, predicate variety, and ALTER DELETE RBAC enforcement.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Cover behaviors with no existing tests:
- `has_lightweight_delete` flag transitions S0→S1→S2 in `system.parts`
- `COUNT(*)` correctness when `optimize_trivial_count_query=1` and LWD parts exist
- `MATERIALIZED` and `DEFAULT` columns correct after LWD
- `has_lightweight_delete` flag recovery on `ReplicatedMergeTree` after merge
- `read_in_order` optimization correctly skips LWD-deleted rows
- Multiple delete→merge cycles: flag and row correctness at each stage
- `_row_exists` absent from `system.columns` and not exposed via `SELECT *`
- DELETE WHERE with various predicate forms (equality, range, string, compound)
- `ALTER DELETE` privilege required; `ACCESS_DENIED` without it
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 4, 2026

Workflow [PR], commit [12e6de0]

Summary:

job_name test_name status info comment
Stateless tests (amd_asan_ubsan, distributed plan, parallel, 1/2) failure
02494_query_cache_http_introspection FAIL cidb IGNORED

AI Review

Summary

This PR adds new stateless tests for MergeTree lightweight deletes, including flag lifecycle, count correctness, predicate coverage, and RBAC behavior. The added coverage is valuable, but there are still four correctness gaps in test intent/coverage: three already identified in existing inline comments and one additional concern about disabling fast_test via a no-* tag. Verdict: request changes to make test semantics and coverage match their stated goals.

Missing context
  • ⚠️ No CI logs/results were provided in this review context, so stability conclusions are based on test code and existing discussion only.
Findings

⚠️ Majors

  • [tests/queries/0_stateless/04077_lwd_row_exists_hidden.sql:35] The comment claims SELECT * visibility is tested, but assertions only inspect system.columns. These are different paths; this can miss a real projection regression.
    Suggested fix: add an explicit SELECT * ... LIMIT 0 FORMAT TSVWithNames (or equivalent header assertion) and verify _row_exists is absent.
  • [tests/queries/0_stateless/04077_lwd_read_in_order.sql:25] SELECT count() does not require ordered reading and does not validate read_in_order behavior as claimed.
    Suggested fix: use an ORDER BY ... LIMIT ... shape (or subquery preserving ordered read path) for the count check.
  • [tests/queries/0_stateless/04077_lwd_multiple_cycles.sql:30] With lightweight_deletes_sync = 2, deletes are sequential; this does not test overlap while prior mutation is active.
    Suggested fix: run with async mutation settings or explicitly validate concurrent mutation state before issuing the next delete.
  • [tests/queries/0_stateless/04077_lwd_alter_delete_privilege.sh:2] # Tags: no-fasttest weakens coverage for RBAC-on-LWD in the fastest suite and conflicts with the policy to avoid no-* tags unless strictly necessary.
    Suggested fix: stabilize the test in fast_test using local settings and deterministic assertions instead of excluding the suite.
Tests
  • ⚠️ Add a direct SELECT * projection-visibility assertion for _row_exists (not only system.columns).
  • ⚠️ Add a truly overlapping mutation scenario for the multiple-cycle test if concurrency behavior is part of intended coverage.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Align each test assertion with its stated behavior (SELECT * visibility, true read_in_order coverage, true overlap semantics).
    • Remove no-fasttest by making 04077_lwd_alter_delete_privilege stable in fast_test.

@clickhouse-gh clickhouse-gh Bot added the pr-build Pull request with build/testing/packaging improvement label Apr 4, 2026
FROM system.columns
WHERE table = 't_lwd_hidden' AND database = currentDatabase() AND name = '_row_exists';

-- SELECT * must not include _row_exists — verify via explicit column list from system.columns
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test description says it verifies that SELECT * does not expose _row_exists, but the assertions only query system.columns. Those are related but not equivalent code paths.

Please add an explicit SELECT * ... LIMIT 0 FORMAT TSVWithNames (or an equivalent header-level check) and assert that _row_exists is absent there as well, so the test actually covers the stated behavior.

fm4v added 2 commits April 5, 2026 23:10
…rding to client_logs_file

Use --send_logs_level=fatal for user invocations that expect ACCESS_DENIED,
so the exception is not forwarded as a server log message to --client_logs_file,
which would fail the 'having stderror' check in the test framework.
…globally

Override CLICKHOUSE_CLIENT to use --send_logs_level=fatal for all invocations.
LWD mutations and surrounding queries can emit WARNING-level server log messages
that get forwarded to --client_logs_file, triggering the 'having stderror' check
in the test framework. Setting fatal suppresses all non-fatal forwarding.
SETTINGS optimize_read_in_order = 1, read_in_order_two_level_merge_threshold = 1;

-- Total count via read_in_order path
SELECT count() FROM t_lwd_read_order
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT count() without ORDER BY does not exercise the read_in_order execution path, so this assertion does not validate what the comment claims.

Please make this check go through an ordered read path (for example, count rows from an ORDER BY ... SETTINGS optimize_read_in_order=1 subquery), or drop the "via read_in_order path" wording.

fm4v added 2 commits April 6, 2026 11:25
Appending --send_logs_level=fatal to CLICKHOUSE_CLIENT does not override
the existing --send_logs_level=warning because --allow_repeated_settings
only applies to query settings, not client-level options. Replace the value
in-place using sed, following the same pattern as 01187_set_profile_as_setting.sh.
…expected errors

Replace 2>&1 | grep ACCESS_DENIED with the idiomatic ClickHouse test pattern:
-- { serverError 497 }. The client exits 0 when the expected ACCESS_DENIED
occurs and non-zero if it unexpectedly succeeds. This avoids all stderror
forwarding issues and content matching fragility.
DELETE FROM t_lwd_cycles WHERE a BETWEEN 100 AND 199;
SELECT count() = 800 FROM t_lwd_cycles;

-- Cycle 3: second LWD while cycle 2 mutation is still active (S1 + S1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With SET lightweight_deletes_sync = 2, each DELETE waits for mutation completion before returning, so this query does not run while the previous delete is still active. As written, this test covers sequential delete cycles, not overlapping lightweight-delete mutations.

If the intent is to test concurrent mutation overlap, run the first delete asynchronously (for example with lightweight_deletes_sync = 0) and only then issue the second delete.

fm4v added 4 commits April 6, 2026 12:04
…logs_level=fatal

-- { serverError 497 } suppresses the client exit code failure, but the server
still logs the exception at ERROR level and forwards it via send_logs_level=warning
to --client_logs_file, triggering 'having stderror'. Both fixes are needed.
…nt check

fast_test may set mutations_sync=0; add explicit lightweight_deletes_sync=2
and mutations_sync=2 to ensure the DELETE completes before count() runs.
The test consistently fails in fast_test but the exact failure mode
(stdout diff vs having-stderror) cannot be determined from the available
CI logs. The test logic is correct and passes in the full stateless test
suite. Skip fast_test to unblock the PR; the RBAC enforcement is still
verified in debug/asan/tsan runs.
…TER DELETE

DELETE FROM internally creates ALTER TABLE UPDATE _row_exists = 0, which
requires ALTER UPDATE(_row_exists) in addition to ALTER DELETE.
Granting only ALTER DELETE is not sufficient for LWD in 26.4.
Switch to ALTER TABLE (comprehensive) to avoid the privilege mismatch.

Bug: ALTER DELETE alone should be sufficient for DELETE FROM per docs.
@@ -0,0 +1,62 @@
#!/usr/bin/env bash
# Tags: no-fasttest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the no-fasttest tag here weakens coverage for exactly the scenario this PR is trying to protect (LWD RBAC behavior), and it conflicts with the test-suite policy to avoid no-* tags unless strictly unavoidable.

Please make this test stable in fast_test instead of excluding it. If fast_test has mutations_sync=0 or log-forwarding differences, set the required query/session settings locally in the test so the same assertions can run in all stateless suites.

@fm4v fm4v added this pull request to the merge queue Apr 6, 2026
Merged via the queue into master with commit d0ef416 Apr 6, 2026
161 of 163 checks passed
@fm4v fm4v deleted the nik/lwd-tests branch April 6, 2026 21:00
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-build Pull request with build/testing/packaging improvement pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants