Skip to content

[RIP 19] Server side rebalance, lightweight consumer client support

rongtong edited this page Feb 7, 2022 · 2 revisions

Status

Background & Motivation

What do we need to do

  • Will we add a new module?
    No.
  • Will we add new APIs?
    Yes.
  • Will we add new feature?
    Yes.

Why should we do that

  • Are there any problems of our current project?
    The current subscription load balancing strategy is based on the dimension of message queue. All behaviors are owned by the client side. There are three main steps:
    1. Each consumer regularly obtains the total number of topic message queues and all consumers.
    2. Using a general algorithm to sort the queues by consumer ip and queue index to calculate which message queue is allocated to which consumer.
    3. Each consumer pulls messages using allocated orders described above.


According to this allocation method, if an abnormality occurs in a consumer (the application itself is abnormal, or a broker is upgrading) so that it causes slow subscription, messages will be accumulated, but this queue will not be re-allocated to another consumer, so the accumulation will become more and more serious.
Chinese version:
当前的消费负载均衡策略是以队列的维度来进行,所有行为全部是由客户端主动来完成,主要分为三步:

  1. 每个consumer定时去获取消费的topic的队列总数,以及consumer总数
  2. 将队列按编号、consumer按ip排序,用统一的分配算法计算该consumer分配哪些消费队列
  3. 每个consumer去根据算法分配出来的队列,拉取消息消费


按照这个分配方式,如果有一个队列有异常(应用自身异常,或某个broker在升级)导致消费较慢或者停止,该队列会出现堆积现象,因为队列不会被分配给其他机器,因此如果长时间不处理,队列的堆积会越来越严重。

  • What can we benefit proposed changes?
    The accumulated messages will be subscribed by other consumers if one consumer behaves abnormally.
    Chinese version:
    在某个队列消费异常的情况下,可以快速的由其它消费者接手进行消费,缓解堆积状态。

Goals

  • What problem is this proposal designed to solve?
    The accumulated messages will be subscribed by other consumers if one consumer behaves abnormally.
    Chinese version:
    在某个队列消费异常的情况下,可以快速的由其它消费者接手进行消费,缓解堆积状态。
  • To what degree should we solve the problem?
    This RIP must guarantee below point:
    1. High availablity: Subscription of one message queue will not be affected by single consumer failure.
    2. High performance: This implementation affects latency and throughput less than 10%.


Chinese version:
新方案需要保证两点:

  1. 高可用:单一队列的消费能力不受某个消费客户端异常的影响
  2. 高性能:POP订阅对消息消费的延迟和吞吐的影响在10%以内

Non-Goals

  • What problem is this proposal NOT designed to solve?
    Improve client-side load balancing.
  • Are there any limits of this proposal?
    Nothing specific.

Changes

Architecture


Current "Pull mode":


Proposed "Pop mode":


Move inter-queue balance of one topic from client side to server side. Clients make pull request without specified queues to broker, and broker fetch messages from queues internally and returns, which ensures one queue will be consumed by multiple clients. The whole behavior is like a queue pop process.

It will add a new request command querying queue assignments in broker, and add pop-feature-support flag to pull request which makes broker use pop mode.

Interface Design/Change

  • Method signature changes
    Nothing specific.
  • Method behavior changes
    Nothing specific.
  • CLI command changes
    Add setConsumeMode for admin to switch between old pull mode and new pop mode for one subscription.
  • Log format or content changes
    Nothing specific.

Compatibility, Deprecation, and Migration Plan

  • Are backward and forward compatibility taken into consideration?
    New RequestCode between client and broker are added, so there are 2 compatibility situations:
    1. old client+new broker: old clients won't make request with pop-feature-support flag, so broker will not enable pop mode, which keep all things as before.
    2. new client+old broker: new clients will detect whether broker support the new request command querying queue assignments, if not, it will fallback to use old pull mode.
  • Are there deprecated APIs?
    Nothing specific.
  • How do we do migration?
    Nothing specific.

Implementation Outline


We will implement the proposed changes by 2 phases.

Phase 1

  1. Implement server-side balance capability in broker
  2. Implement client-side request using new pop-mode

Phase 2

  1. Implement new sdk compatibility with old broker.
  2. Implement feature detection in broker and client.

Rejected Alternatives

How does alternatives solve the issue you proposed?


Improve client rebalance logic? I don't get a quite good idea.

Pros and Cons of alternatives


Client rebalance logic will become quite complicated.

Why should we reject above alternatives

Clone this wiki locally