Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

feat(split): update group partition count #392

Closed
wants to merge 12 commits into from

Conversation

hycdong
Copy link
Contributor

@hycdong hycdong commented Feb 12, 2020

Simple partition split process

  1. meta receives client partition split request, and change partition count split: meta start partition split #286
  2. replica notices partition count changed during on_config_sync
  3. parent partition create child partition split: parent replica create child replica #291
  4. parent prepare states for child to learn feat(split): parent replica prepare states #299
  5. child partition async learn states from parent feat(split): child replica learn parent prepare list and checkpoint #309 feat(split): child replica apply private logs, in-memory mutations and catch up parent #319
  6. child notify parent catch up feat(split): child notify parent catch up #390
  7. update child group partition count
  8. register child partition feat(split): register child partition #391
  9. update parent group partition count

More partition split discussion in issue #69 and partition split design doc
This pr solves the part of fifth step of partition split, which is bold in process description.

What this pr solved

  • update_group_partition_count: primary parent send update_group_partition_count_request to all parent group or child group
  • on_update_group_partition_count: all replica, not matter the replica is parent or child, primary or secondary, update app_info in memory and on disk, update partition_version. Besides, for parent replica, it should clean up split context, because split finished
  • on_update_group_partition_count_reply: primary parent handle update_group_partition_count_response. If all child group finish update partition count, primary parent will register child partition on meta server.

@hycdong hycdong changed the title feat(split): update group partition count [WIP] feat(split): update group partition count Apr 28, 2020
@hycdong hycdong marked this pull request as ready for review April 28, 2020 07:26
acelyc111
acelyc111 previously approved these changes May 8, 2020
@@ -754,6 +756,202 @@ void replica::child_partition_active(const partition_configuration &config) // o
update_configuration(config);
}

// ThreadPool: THREAD_POOL_REPLICATION
void replica::update_group_partition_count(int32_t new_partition_count,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent_update_group_partition_count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_group_partition_count - primary parent send request to all replicas
on_update_group_partition_count - all replicas update its partition_count
on_update_group_partition_count_reply- primary parent callback
I think function names above are clearer, so I don't add parent prefix for update_group_partition_count.

src/dist/replication/lib/replica_split.cpp Outdated Show resolved Hide resolved
@@ -754,6 +756,202 @@ void replica::child_partition_active(const partition_configuration &config) // o
update_configuration(config);
}

// ThreadPool: THREAD_POOL_REPLICATION
void replica::update_group_partition_count(int32_t new_partition_count,
bool is_update_child) // on primary parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function mixes up update-child and update-parent, making it more complex than it should be.
Consider this: split this function into parent_update_parent_group_partition_count and parent_update_child_group_partition_count, and makes them both use call_update_group_partition_count_rpc.

void replica::call_update_group_partition_count_rpc(gpid) {
    FAIL_POINT_INJECT_F(...);

    std::unordered_set<dsn::rpc_address> not_replied_addresses;
    // _primary_states.statuses is a map structure: rpc address -> partition_status
    for (const auto &kv : _primary_states.statuses) {
        not_replied_addresses.insert(kv.first);
    }
    for (auto &iter : _primary_states.statuses) {
        rpc.call([](){  on_update_group_partition_count_reply(); })
    }
}
void replica::parent_update_parent_group_partition_count(gpid) {
    ddebug_replica("start to update parent group partition count, new partition count = {}, ",
                   new_partition_count);
    call_update_group_partition_count_rpc(get_gpid());
}
void replica::parent_update_child_group_partition_count(gpid) {
    if (_child_gpid.get_app_id() == 0 || _child_init_ballot != get_ballot()) {
        dwarn_replica("can not update group partition count because current state is out-dated, "
                      "_child_gpid({}), _child_init_ballot = {}, local ballot = {}",
                      _child_gpid,
                      _child_init_ballot,
                      get_ballot());
        _stub->split_replica_error_handler(
            _child_gpid,
            std::bind(&replica::child_handle_split_error,
                      std::placeholders::_1,
                      "update_group_partition_count because out-dated request"));
        parent_cleanup_split_context();
        return;
    }
    ddebug_replica("start to update child group partition count, new partition count = {}, ",
                   new_partition_count);
    call_update_group_partition_count_rpc(get_gpid());
}

Copy link
Contributor Author

@hycdong hycdong May 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code logic is not too complex, the code logic is like:

step1: if update child partition_count, do some validation
step2: get structure not_replied_addresses
step3: send request to replica group

In your suggestion, call_update_group_partition_count_rpc will implement step2 and step3. However, update_group_partition_count_request have difference between sending to parent or child, you can see L792 to L807.

hycdong added a commit to hycdong/rdsn that referenced this pull request Jun 24, 2020
@hycdong hycdong marked this pull request as draft September 11, 2020 05:53
@hycdong hycdong closed this Oct 21, 2020
@hycdong hycdong deleted the update_group_partition_count branch October 29, 2020 07:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants