Skip to content

ReplicatedMergeTree: CHECK TABLE fails after fetching part with unknown projection (re-fetch loop) #100413

@zlareb1

Description

@zlareb1

Describe what's wrong

PR #99623 fixed parts with unknown projections being marked as lost, but the fix is incomplete on the receiver side (replica that fetches the part). After a successful fetch, a subsequent CHECK TABLE or the background part-check thread calls checkDataPart and fails with NO_FILE_IN_DATA_PART, marking the part as broken and triggering an infinite re-fetch loop.

Root cause:

downloadPartToDisk (fixed by #99623) patches data_checksums in memory so the immediate checkEqual passes — but checksums.txt is still written to disk verbatim from the sender, including the pp.proj entry that was never transferred (because the projection is unknown, getProjectionParts() returns 0 for it so send_projections loop skips it).

When checkDataPart later runs on replica 2:

  • projections_on_disk is built by iterating the part directory for .proj subdirectories — pp.proj does not exist on disk, so projections_on_disk stays empty for it.
  • getProjectionParts() does not include pp (unknown projection).
  • The new guard added by redo Part with unknown projections should not be marked as lost forever #99623if (!projections_on_disk.empty())does not fire.
  • checksums_txt (loaded from disk) still has pp.proj.
  • checksums_data does not have pp.proj (directory never existed on disk).
  • checksums_txt.checkEqual(checksums_data) throws NO_FILE_IN_DATA_PART.

The part is marked broken → re-fetched from replica 1 → same outcome → infinite loop.

The fix in checkDataPart.cpp only handles the local-attach case (.proj directory exists on disk but is unknown to the table, i.e. projections_on_disk is non-empty). It does not handle the fetch case (checksums.txt references a .proj entry whose directory was never transferred).

Does it reproduce on the most recent release?

Yes — introduced/unblocked by #99623 (merged 2026-03-21).

How to reproduce

Requires ReplicatedMergeTree with 2 replicas and a ZooKeeper/Keeper.

-- On both replicas:
CREATE TABLE t (x Int32, y Int32, PROJECTION p (SELECT x, y ORDER BY x))
ENGINE = ReplicatedMergeTree('/clickhouse/tables/test/t', '{replica}')
PARTITION BY intDiv(y, 100) ORDER BY y
SETTINGS max_parts_to_merge_at_once = 1;

-- On replica 1:
INSERT INTO t SELECT number, number FROM numbers(7);
ALTER TABLE t ADD PROJECTION pp (SELECT x, count() GROUP BY x);
ALTER TABLE t MATERIALIZE PROJECTION pp SETTINGS mutations_sync=2;
ALTER TABLE t DETACH PARTITION 0;

-- On replica 2: remove the detached copy so it is forced to re-fetch later
ALTER TABLE t DROP DETACHED PARTITION 0 SETTINGS allow_drop_detached=1;

-- On replica 1: drop projection while part is detached, then re-attach
ALTER TABLE t DROP PROJECTION pp;
ALTER TABLE t ATTACH PARTITION 0;

-- On replica 2: sync, then run CHECK TABLE
SYSTEM SYNC REPLICA t;
SELECT count() FROM t;   -- returns 7 (data is fine)
CHECK TABLE t;           -- fails: NO_FILE_IN_DATA_PART for pp.proj

The SELECT succeeds because data is intact. CHECK TABLE fails because checksums.txt references pp.proj but the directory was never transferred. The background ReplicatedMergeTreePartCheckThread will then mark the part as broken and re-fetch it from replica 1, creating a loop.

The same state can be reproduced without replication — manually remove a .proj directory from a detached part whose checksums.txt still references it, then attach and run CHECK TABLE:

# After the detach step above:
rm -rf /var/lib/clickhouse/data/default/t/detached/<part_name>/pp.proj
ALTER TABLE t ATTACH PARTITION 0;
SELECT count() FROM t;   -- 7, works
CHECK TABLE t;           -- Code: 226. NO_FILE_IN_DATA_PART: No file pp.proj in data part

Suggested fix

In checkDataPart.cpp, after the existing new block (which handles projections whose directories exist on disk), also remove from checksums_txt any .proj entries that are referenced in checksums_txt but absent from checksums_data (i.e., whose directory was never on disk):

/// Also handle the case where checksums_txt references a .proj entry whose
/// directory is absent from disk entirely (e.g. receiver after an inter-replica
/// fetch of a part with an unknown projection — checksums.txt was transferred
/// but the .proj directory was not).
for (auto it = checksums_txt.files.begin(); it != checksums_txt.files.end(); )
{
    if (it->first.ends_with(".proj") && !checksums_data.has(it->first))
    {
        is_broken_projection = true;
        it = checksums_txt.files.erase(it);
    }
    else
        ++it;
}

The test 03822_attach_with_unknown_projection.sh should also add CHECK TABLE t_unknown_proj_2 after SYSTEM SYNC REPLICA t_unknown_proj_2 to cover this path.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed user-visible misbehaviour in official release

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions