Skip to content

[BUG] [SCHEMA CHANGE] Create tablet fail when schema chagne #6015

@liutang123

Description

@liutang123

Describe the bug
We have a table with many replicas.
Because of base tablet may be delete by balance, create tablet may fail.

To Reproduce
Steps to reproduce the behavior:

  1. create schema change job
ALTER TABLE test_db.test_table   ADD COLUMN name varchar(100) comment 'xxx'
  1. FE will create a schema change job:
2021-06-11 14:17:57,739 INFO (thrift-server-pool-150|359) [SchemaChangeHandler.createJob():1383] finished to create schema change job: 64494481

This step generates the partitionIndexTabletMap of SchemaChangeJobV2. So, the locations of new tablet replicas are fixed.
3. Wait table become stable:

2021-06-11 14:18:12,816 INFO (schema change|25) [OlapTable.isStable():1391] table 23196651 is not stable because tablet 60422768 status is REDUNDANT. replicas: [[replicaId=60422770, BackendId=10003], [replicaId=60422771, BackendId=10006], [replicaId=62269543, BackendId=61958307, version=2], [replicaId=64494415, BackendId=61958301]]

Tablet 60422768 is REDUNDANT.
4. TabletScheduler remove 60422768 in FE meta.

2021-06-11 14:18:26,495 INFO (tablet scheduler|38) [TabletScheduler.deleteReplicaInternal():982] delete replica. tablet id: 60422768, backend id: 10006. reason: DECOMMISSION state, force: false
  1. Delete replica when report:
2021-06-11 14:19:12,757 WARN (Thread-33|79) [ReportHandler.deleteFromBackend():677] failed add to meta. tablet[60422768], backend[10006]. errCode = 2, detailMessage = replica is enough[3-3]
2021-06-11 14:19:12,757 WARN (Thread-33|79) [ReportHandler.deleteFromBackend():690] delete tablet[60422768 - 118915135] from backend[10006] because not found in meta
  1. Start create tablet
2021-06-11 14:20:12,947 INFO (schema change|25) [AlterJobV2.checkTableStable():209] table 23196651 is stable, start SCHEMA_CHANGE job {}
  1. BE create tablet fail, because fail to find base tablet 60422768
W0611 14:20:46.569319 425891 tablet_manager.cpp:244] fail to create tablet(change schema), base tablet does not exist. new_tablet_id=64530888, new_schema_hash=1683434764, base_tablet_id=60422768, base_schema_hash=118915135
  1. schema change fail
2021-06-11 14:20:46,628 WARN (schema change|25) [SchemaChangeJobV2.runPendingJob():309] failed to create replicas for job: 64494481, 10006: []

Expected behavior

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions