Skip to content

[fix](cloud)(restore) fix broken schema during restore of lsc=false tables#62708

Open
LemonCL wants to merge 2 commits into
apache:masterfrom
LemonCL:fix/cloud-restore-lsc-false-broken-schema
Open

[fix](cloud)(restore) fix broken schema during restore of lsc=false tables#62708
LemonCL wants to merge 2 commits into
apache:masterfrom
LemonCL:fix/cloud-restore-lsc-false-broken-schema

Conversation

@LemonCL
Copy link
Copy Markdown
Contributor

@LemonCL LemonCL commented Apr 22, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: N/A

Problem Summary:

In cloud mode, when restoring a snapshot of a table that was created with
light_schema_change = false, the schema KV under
meta_schema_key({instance_id, index_id, schema_version}) ends up with every
column carrying unique_id = -1, because the schema is first written by
create_tablet (where unique_ids are not yet materialized). commit_restore_job
then receives the correct schema from each rowset meta in the backup, but
the existing put_schema_kv is a no-op when the key already exists, so the
broken schema leaks through and subsequent reads fail with errors such as
column reader is nullptr or different type between schema and column reader.

Upstream has a related DCHECK in put_schema_kv that compares new-vs-saved
schemas (introduced by #55247), and an explicit comment in
MetaServiceImpl::update_tablet warning that put_schema_kv skips writing
when the key already exists, but the restore code path was never patched. PR
#57074 fixed a related index_id issue in the same restore path but did not
touch column unique_id. PR #50657 only prevents new cloud tables from
setting light_schema_change=false, so it does not help legacy snapshots
being restored.

Root cause

  1. create_tablet (called during restore) writes a TabletSchemaCloudPB with
    all unique_id = -1 into the schema KV. This is expected behaviour for
    light_schema_change=false at tablet-creation time.
  2. commit_restore_job receives the real schema (with unique_id >= 0) in
    each rowset meta, and tries to persist it via put_schema_kv.
  3. put_schema_kv sees the key already exists and silently returns without
    writing, so the broken schema remains on disk forever.

Fix

  • New function put_schema_kv_on_restore() in
    cloud/src/meta-service/meta_service_schema.cpp:
    • reads the existing schema value,
    • treats it as broken when the value cannot be parsed, or when
      column(0).unique_id() == -1,
    • range-removes all blob chunks of the stale schema
      ([schema_key, schema_key + encode_int64(INT64_MAX))) and writes the
      correct one.
    • Defensive check: if the incoming schema is itself broken
      (column_size() == 0 or column(0).unique_id() == -1), skip the write
      and log a WARNING. This guarantees we never replace a bad schema with
      another bad one, even if a future caller passes a malformed schema.
  • MetaServiceImpl::commit_restore_job in
    cloud/src/meta-service/meta_service.cpp:
    • replaces the two existing put_schema_kv() call sites with
      put_schema_kv_on_restore(),
    • guards each call with an in-RPC std::set<std::string> restored_schema_keys
      so the same (index_id, schema_version) pair does not issue redundant
      FDB reads/writes when the restore spans many rowsets,
    • counts put/skip events via four new counters that are logged at the end of
      the RPC (rs_meta_schema_put_cnt, rs_meta_schema_skip_cnt,
      tablet_meta_schema_put_cnt, tablet_meta_schema_skip_cnt).
  • put_versioned_schema_kv() is intentionally NOT wrapped by the dedup set
    nor rerouted; it targets a separate key space
    (versioned::meta_schema_key), already skip-if-exists internally, and is
    out of scope for this fix.

Tests

  • Unit tests in cloud/test/meta_service_test.cpp covering every branch of
    put_schema_kv_on_restore:
    • PutSchemaKvOnRestoreTest.PutWhenKeyNotExist — first-write path.
    • PutSchemaKvOnRestoreTest.NoopWhenExistingSchemaIsGood — skip-if-healthy
      path.
    • PutSchemaKvOnRestoreTest.OverwriteWhenExistingIsBroken — the actual
      bug-fix path: seeds a broken schema with unique_id=-1, calls the new
      function, and asserts the resulting KV holds the good schema.
    • PutSchemaKvOnRestoreTest.DefensiveSkipWhenIncomingHasEmptyColumns
      verifies the defensive guard refuses an empty incoming schema.
    • PutSchemaKvOnRestoreTest.DefensiveSkipWhenIncomingHasUidNegativeOne
      verifies the defensive guard refuses an incoming schema whose first
      column still has unique_id == -1.

Run just these tests with:

sh run-cloud-ut.sh --run --filter='PutSchemaKvOnRestoreTest.*'

Release note

Fix a correctness bug in cloud mode where restoring a snapshot of a table
created with light_schema_change = false would leave the schema KV with
every column having unique_id = -1, causing subsequent queries on the
restored table to fail with column reader is nullptr or similar errors.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…ables

Problem
-------
In cloud mode, when a snapshot of a table created with
`light_schema_change = false` is restored, the schema KV persisted under
`meta_schema_key({instance_id, index_id, schema_version})` ends up with
every column carrying `unique_id = -1`, because the schema is first
written by `create_tablet` and at that point the unique_ids are not yet
assigned. `commit_restore_job` then receives the correct schema (with
valid `unique_id >= 0`) in each rowset meta from the backup, but the
existing `put_schema_kv` is a no-op when the key already exists, so the
broken schema leaks through and subsequent reads fail with errors such
as `column reader is nullptr` or `different type between schema and
column reader`.

Fix
---
Introduce `put_schema_kv_on_restore()` in the cloud MetaService. It
reads the existing schema value, detects the broken-schema signature
(`column(0).unique_id() == -1` or an unparseable value), range-removes
all chunks of the stale schema, and writes the correct one. To avoid
replacing a bad schema with another bad one, it also refuses to write
a schema that is itself broken (empty columns or `column(0).unique_id
== -1`) and only logs a warning in that case.

In `MetaServiceImpl::commit_restore_job`, replace both existing
`put_schema_kv` call sites with `put_schema_kv_on_restore`, guarded by
an in-RPC `std::set<std::string>` so the same
`(index_id, schema_version)` pair does not issue redundant FDB
reads/writes when the restore spans many rowsets. Four counters
(`rs_meta_schema_put_cnt/skip_cnt`,
`tablet_meta_schema_put_cnt/skip_cnt`) are logged at the end of the
RPC to make the behaviour observable in production.

`put_versioned_schema_kv()` is intentionally NOT wrapped by the dedup
set: it targets a different key space (`versioned::meta_schema_key`)
and is already skip-if-exists inside the function.

Tests
-----
`cloud/test/meta_service_test.cpp` adds 5 unit tests covering every
branch of `put_schema_kv_on_restore`:
  - `PutWhenKeyNotExist`                     — first-write path
  - `NoopWhenExistingSchemaIsGood`           — skip-if-healthy path
  - `OverwriteWhenExistingIsBroken`          — the fix itself
  - `DefensiveSkipWhenIncomingHasEmptyColumns`  — defensive guard
  - `DefensiveSkipWhenIncomingHasUidNegativeOne` — defensive guard
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@LemonCL
Copy link
Copy Markdown
Contributor Author

LemonCL commented Apr 22, 2026

run buildall

@LemonCL
Copy link
Copy Markdown
Contributor Author

LemonCL commented Apr 22, 2026

Hi @plat1ko @meiyi @wyxxxcat @xy720 , could you please take a look
at this fix when you have time?

It addresses a correctness bug on the cloud restore path: when a
snapshot of a table created with light_schema_change = false is
restored, the schema KV under meta_schema_key(...) keeps
column.unique_id = -1 because put_schema_kv is skip-if-exists
and never overwrites the bad schema written earlier by create_tablet.
The new put_schema_kv_on_restore detects and overwrites only the
broken case, with a defensive guard to avoid replacing a bad schema
with another bad one. Five unit tests cover every branch.

Thanks!

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 76.19% (48/63) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.07% (1844/2362)
Line Coverage 64.74% (33019/51004)
Region Coverage 65.25% (16390/25118)
Branch Coverage 55.79% (8745/15674)

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 76.19% (48/63) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.07% (1844/2362)
Line Coverage 64.74% (33018/51004)
Region Coverage 65.24% (16387/25118)
Branch Coverage 55.77% (8741/15674)

@gavinchou gavinchou added area/backup Issues of PRS related to backup and restore cloud labels Apr 29, 2026
@xy720
Copy link
Copy Markdown
Member

xy720 commented May 28, 2026

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/backup Issues of PRS related to backup and restore cloud

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants