Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

feat(split): secondary start split #675

Merged
merged 4 commits into from
Nov 30, 2020

Conversation

hycdong
Copy link
Contributor

@hycdong hycdong commented Nov 26, 2020

Simple partition split process

  1. meta receives client partition split request, and change partition count split: meta start partition split #286
  2. replica notices partition count changed during on_config_sync feat(split): add splitting_replicas while on_config_sync #653
  3. parent partition create child partition split: parent replica create child replica #291
  4. parent prepare states for child to learn feat(split): parent replica prepare states #299
  5. child partition async learn states from parent feat(split): child replica learn parent prepare list and checkpoint #309 feat(split): child replica apply private logs, in-memory mutations and catch up parent #319
  6. child notify parent catch up feat(split): child notify parent catch up #390
  7. update child group partition count feat(split): add update_child_group_partition_count #645
  8. register child partition feat(split): register child partition #391
  9. update parent group partition count feat(split): parent group update partition count #654

More partition split discussion in issue #69 and partition split design doc
This pr solves the part of third step of partition split, which is bold in process description.

What this pr solve

Primary parent notices that it should start partition split by on_config_sync, then it will pass it to secondary through group_check by calling broadcast_group_check, and put split related filed in group_check_request, when secondary receives this request, it will call trigger_secondary_parent_split.

// Used to deliver child gpid during partition split
6:optional dsn.gpid child_gpid;
// Used to deliver child gpid and meta_split_status during partition split
6:optional dsn.gpid child_gpid;
Copy link
Contributor

@neverchanje neverchanje Nov 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How primary knows its child_gpid? I think it certainly can calculates out on its own gpid.
Such as parent_gpid(1, 4) => child_gpid(1, 4+PARTITION_COUNT)
If primary can, the secondary can too. Primary does not have to sync this knowledge to secondaries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primary calculate child_gpid in trigger_primary_parent_split, if it found the partition_count sync from meta server is double of its own value:

auto child_gpid =
gpid(get_gpid().get_app_id(),
get_gpid().get_partition_index() + _replica->_app_info.partition_count);
add_child_request.__set_child_gpid(child_gpid);
parent_start_split(add_child_request);

It is a safe solution to sync it to secondary, if any bug happened, such as duplicated split, secondary may get wrong child_gpid, leading dangerous result.

@neverchanje
Copy link
Contributor

neverchanje commented Nov 26, 2020

enum split_status
{
    // idle state
    NOT_SPLIT,
    // A replica is splitting into two replicas, one call parent, one call child.
    SPLITTING,
    PAUSING,
    PAUSED,
    // After the split successfully cancelled, the state turns into NOT_SPLIT.
    CANCELING
}

I'd recommend add some comments on the split_status of replication.thrift.

@hycdong hycdong merged commit 65edf61 into XiaoMi:master Nov 30, 2020
@hycdong hycdong deleted the secondary_start_split branch November 30, 2020 04:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants