Skip to content
This repository has been archived by the owner on Feb 6, 2024. It is now read-only.

Refactor shard version related logic #263

Open
ZuLiangWang opened this issue Oct 26, 2023 · 0 comments
Open

Refactor shard version related logic #263

ZuLiangWang opened this issue Oct 26, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@ZuLiangWang
Copy link
Contributor

Description
The current shard version verification implementation is not perfect enough and has the following problems:

  • The shard versions of CeresMeta and CeresDB are independent of each other. When inconsistencies occur, they must be restored by restarting the CeresDB node.
  • shard version synchronization is chaotic and prone to unexpected Version inconsistencies.
  • The verification logic of shard version limits concurrent DDL. Only one DDL can succeed on a shard at the same time.

Proposal
Redesign and implement shard version related logic.

Additional context
Some current thoughts:

  1. How to synchronize meta version with ceresdb?
    1. Return the latest version in the response of creating and deleting tables (I prefer this solution)
    2. Synchronize the latest version through heartbeat
    3. meta pulls the latest version through the interface provided by ceresdb
  2. Who will persist the shard version information?
    1. Keep it as is, persisted by meta, and ceresdb synchronizes version from meta when opening shard (I prefer this solution)
    2. Version persistence is maintained by ceresdb. When opening shard, ceresdb synchronizes it to meta through response.
  3. How to handle version when operating shards concurrently?
    1. Leave it as is, only one operation will succeed and the others will fail.
    2. When making a batch batch, create a table, delete a table and make a batch, you must consider how to increment the version.
      1. Batch operation, version +1
      2. For each operation in the batch, version +1
  4. Are version inconsistencies allowed within a certain range?
    1. Not allowed, must be completely consistent (current method)
    2. Record the operations on the shard, and ignore the version when operating the shard that allows changes or there will be a certain range of inconsistencies in the operation.
  5. How to recover when versions are inconsistent?
    1. Manually restart the node (current method, not acceptable)
    2. Automatic error correction and recovery
      1. Meta regularly inspects all shard versions. For inconsistent versions, meta initiates repair operations to ceresdb.
      2. ceresdb is responsible for error correction. When receiving a request with an inconsistent version, ceresdb initiates a repair operation to ceresmeta.
      3. How to correct the error specifically and what needs to be done before synchronizing to a consistent version?
        1. Try to rebuild the table or delete the table so that the failed procedure can be executed successfully.
        2. Ignore it directly and force version synchronization.
@ZuLiangWang ZuLiangWang added the enhancement New feature or request label Oct 26, 2023
ShiKaiWi pushed a commit that referenced this issue Nov 2, 2023
## Rationale
For detail, see: #263
In this pr, add the checksum repair logic of shard version.

## Detailed Changes
* Add `MayCorrectShardVersion` in `RegisterNode`, it will correct shard
version when it is inconsistent in ceresmeta and ceresdb.

## Test Plan
I created some local shard version inconsistent scenarios to verify its
repair ability.
ShiKaiWi pushed a commit that referenced this issue Nov 8, 2023
## Rationale
Refer to this issue: #263

## Detailed Changes
* Reconstruct the process of create/drop table so that the update of
shard version depends on CeresDB

## Test Plan
Pass existing unit tests and integration tests.
ShiKaiWi added a commit to apache/horaedb that referenced this issue Nov 9, 2023
## Rationale
For details, see: apache/incubator-horaedb-meta#263

## Detailed Changes
* Modify the return value of `CreateTableOnShard` & `DropTableOnShard`
to return the latest shard version.

## Test Plan
Pass all unit tests and integration test.

---------

Co-authored-by: xikai.wxk <xikai.wxk@antgroup.com>
Co-authored-by: WEI Xikai <ShiKaiWi@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant