Skip to content

Schema Skew Issue #1986

@hananbs

Description

@hananbs

Hi team.
We are using the operator . and sometimes due to some race condition we endup with permanent schema skew.

examples: if there was some error while migrateTables() run,the error is simply logged and proceed (Host is pushed and added to HostWithTablesCreated() meaning it will not attempt to fix the schema skew. also the status is completed and all shards pods mark as ready regardless of the status since they rely on /ping.

Or even if there are 'Create Table .. ON CLUSTER' running on shard-0 while shard-1 is still bootstrap it might miss some of queries since kubelet syncFrequency (by default 1m) will take time till shard-0 aware shard-1 added to the actual remote_servers.xml on Disk.

I wondering what is the best way to solve it since it seems the Operator is 'best-effort' for DDL alignment. I thought to migrate and use Replicated Database engine which solve exactly that. but even if I will switch, in case of error It seems the Operator is still best-effort for Run the Create Database replicated on new pods/shards.

for example: shard-2 join the cluster the operator will attempt to run 'Create Database Engine = Replicated'. but if also that fail it will add that Host to cluster.

I was thinking having custom readiness probe which is maybe base on DDL alignment or Some Topology gate. the problem is after some analysis of the code, it seems the Operator Code is relying on Ready Status. so I might endup with deadlock scenario especially on rolling upgrade (restarts) scenario pods might get stuck.

Another option I though about is maybe not swallow the error on migrateTables() raise the error. and maybe re-queue so the reconcile will be retried maybe? or there is reason having the operator working on best-effort strategy?
even if migrateTables succeed or fail we will end-up with Host added to HostWithTabelsCreated().

We are using versions 0.25.2 and 0.26.3

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions