Skip to content
RabbitMQ tool and plugin for balancing out Queue Masters in a clustered installation
Branch: v3.7.x
Clone or download
Latest commit 70d8c25 Jun 14, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
include Introduce message relative synchronization verification Jun 13, 2019
priv/images Updates: Oct 22, 2017
src Bump up default FSM state event call timeout Jun 13, 2019
test
.gitignore fix use of non-active queue master pids on balance transitions Nov 23, 2018
Makefile Introduce message relative synchronization verification Jun 13, 2019
README.md Update documentation Jun 14, 2019
erlang.mk Update erlang.mk and rabbitmq-components.mk Jun 13, 2019
rabbitmq-components.mk Update erlang.mk and rabbitmq-components.mk Jun 13, 2019

README.md

RabbitMQ Queue Master Balancer

RabbitMQ Queue Master Balancer is a tool used for attaining queue master equilibrium across a RabbitMQ cluster installation. The plugin achieves this by computing queue master counts on nodes and engaging in shuffling procedures, with the ultimate goal of evenly distributing queue masters across RabbitMQ cluster nodes. Internally, the tool comprises of an FSM engine which transitions between different states of operation, to allow the procedures to be carried in a fully controllable manner.

We define Queue Equilibrium as the state in which the following condition is/has been satisfied across all running cluster nodes:

inline fit

Points to consider prior usage:

  • This plugin will only apply to mirrored queues (with or without messages), and non-mirrored queues without messages. For non-mirrored queues holding messages, we recommend users first setup an HA policy of choice, based on their cluster setup and needs, and then make use of this plugin.
  • The plugin considers single, non-mirrored queues holding messages a risk to migrate around cluster nodes, for example, in case of any network problems during migration procedures of such a queue, possibility of losing messages cannot be ignored. It therefore ignores such queues, up until users have setup an HA policy of choice and have slave queues running on the cluster for message redundancy. This is a safety precaution to protect users from any possibility of message loss during queue migration.
  • The plugin is young, and yet to mature. We strictly recommend usage for RabbitMQ support operations only, at planned and scheduled time periods when the cluster nodes are under minimal, or most preferably, zero traffic load.

Design

The following diagram depicts the underlying design and operational concepts of the Queue Master Balancer FSM engine.

inline fit

As illustrated, the plugin is at any time in it's operation in one of 4 states, which are:

  • IDLE: In this state, the plugin is at rest and carries out no actions while awaiting to be triggered into READY or BALANCING QUEUES states depending on the received event, which are $load_queues or $balance_queues event.
  • READY: The plugin engages the READY state whenever it is loaded with queues. The $load_queues event will cause the plugin to acquire all queues in preparation for balancing them out in the cluster, thus entering the READY state.
  • BALANCING QUEUES: This is the state in which the plugin is "working", i.e. moving queues around the cluster to achieve queue equilibrium. The $balance_queues event will bring the plugin into this state, depending on its previous state.
  • PAUSE: While in BALANCING QUEUES state, the plugin may be paused by triggering the $pause event. Once in PAUSE state, the plugin retains the current pending queues still to be balanced, and awaits reception of $continue event to proceed in balancing them out. This $continue event takes the plugin's state back to BALANCING QUEUES.

In addition, regardless of its current state, the Queue Master Balancer plugin is always receptive to $load_queues, $stop and $reset events. Details on events and their execution are discussed in Operation section.

Supported RabbitMQ Versions

This plugin is compatible with RabbitMQ 3.6.x and beyond (3.7.x), to the latest release.

Installation

Packaging

Download pre-compiled versions from https://github.com/Ayanda-D/rabbitmq-queue-master-balancer/releases

Build

Clone and switch to branch v3.6.x or v3.7.x to build the plugin for RabbitMQ versions 3.6.x or 3.7.x respectively, then execute make. To create a package, execute make dist and find the .ez package file in the plugins directory.

Ensure the RabbitMQ version you are building/packaging for is compatible to the Erlang, as well as Elixir versions installed. For more detail on this, please refer to https://www.rabbitmq.com/which-erlang.html

Testing

Likewise, clone and switch to branch v3.6.x or v3.7.x to build the plugin for RabbitMQ versions 3.6.x or 3.7.x respectively, then execute make tests to test the plugin. View test results from the generated HTML files.

NOTE: Tests can take as long as 15 minutes to execute.

Configuration

The Queue Master Balancer is configured in the rabbitmq.config file for versions 3.6.x, or in the advanced.config file, for versions 3.7.x or later. The following is an example of how it is configured:

[{rabbitmq_queue_master_balancer,
     [{operational_priority,      5},
      {preload_queues,            false},
      {sync_delay_timeout,        3000},
      {sync_verification_factor,  300},
      {policy_transition_delay,   100}]
 }].

The following table summarizes the meaning of these configuration parameters.

PARAMETER NAME ABBREV DESCRIPTION TYPE DEFAULT
operational_priority OP Priority level the plugin will use to balance queues across the cluster. This should be higher than the highest configured policy priority Integer 5
preload_queues PQ Determines whether queues are automatically loaded on plugin start-up before the balancing operation is started Boolean false
sync_verification_factor SVF Time period factor (in milliseconds), relative to the message count per queue currently being balanced, which the plugin will apply and use to ensure message synchronization when a queue has undergone balance transitions. The delay factor is applied for 100 messages (plus the next), hence if for example, if a queue has a message count of 5110, then the effective delay to ensure synchronization will be 15600 milliseconds Integer 300
sync_delay_timeout SDT Time period (in milliseconds) the plugin should wait for slave queues to synchronize to the master queue before balancing procedure is marked complete Integer 3000
policy_transition_delay PTD Time period (in milliseconds), relative to the message count per queue currently being balanced, the plugin should wait while changing/transitioning from one policy to another. The plugin undergoes 4 transitions when balancing a queue and similar to SVF, the configured PTD is proportionally applied for every 100 messages (plus the next) in the current queue undergoing balance transitions Integer 50

Operation

For RabbitMQ versions 3.6.x, the plugin is operated and executed using the traditional and classic rabbitmq-plugins and rabbitmqctl eval command line interfaces. Supported operations are as follows:

1. Enable plugin

The plugin is enabled like any other standard RabbitMQ plugin.

rabbitmq-plugins enable rabbitmq_queue_master_balancer

2. Get plugin information

The plugin may be queried for its current state information at any point of its operation. This is carried out by executing the info call:

rabbitmqctl eval 'rabbit_queue_master_balancer:info().'

3. Get plugin status

The plugin may be queried for it's status information at any point during its execution run. Unlike the info query, the status query is independent of the FSM engine's operational procedures, and gives a short summary of the engine's Process Identifier, Queues Pending Balance and Memory Utilization. The status call is executed as follows:

rabbitmqctl eval 'rabbit_queue_master_balancer:status().'

4. Load queues

Once enabled, the plugin must be loaded with queues which it's going to balance across the cluster. If the preload_queues configuration parameter is set to true in the rabbitmq.config file, this procedure may not be necessary. However, queues may still be loaded as follows regardless of the plugin's configuration:

rabbitmqctl eval 'rabbit_queue_master_balancer:load_queues().'

5. Balance loaded queues

Once queues have been loaded, the balancing procedures may then be executed by issuing the go command as follows:

rabbitmqctl eval 'rabbit_queue_master_balancer:go().'

6. Pause balancing

While queues are undergoing balancing, the plugin may be paused, at which point it switches into its PAUSE state, retaining the queues pending balance up until that point in time. The following command is executed to trigger the plugin's PAUSE state:

rabbitmqctl eval 'rabbit_queue_master_balancer:pause().'

7. Continue balancing

To proceed back into BALANCING QUEUES from the PAUSE state, the continue command is executed as follows:

rabbitmqctl eval 'rabbit_queue_master_balancer:continue().'

8. Reset plugin

The plugin may be reset at any point in time, setting it back to its IDLE state and in the following manner.

rabbitmqctl eval 'rabbit_queue_master_balancer:reset().

9. Stop plugin

In context of the plugin, stopping is similar to reset, in that it is set back to IDLE. The different being the state variables are maintained, and the plugin may still be queried for its last information:

rabbitmqctl eval 'rabbit_queue_master_balancer:stop().'

10. Report

To acquire a report illustrating the distribution of queues across the cluster (at any moment in time), the following command may be used:

rabbitmqctl eval 'rabbit_queue_master_balancer:report().'

11. Disable plugin

The plugin is disabled as follows:

rabbitmq-plugins disable rabbitmq_queue_master_balancer

Additional information

The following aspects must be put into consideration when putting it to use:

  • Queue balancing is a delicate operation which must be carried out in a very controlled manner and environment not prone to network partitions. Ensure your network is in a stable condition prior to executing queue balancing procedures.
  • Configuration parameters such as PTD need to be bumped up as aspects such as cluster size, queue slave count and message size increase.
  • The distribution of Queues within a cluster is non-deterministic and a single execution round of the Queue Master Balancer may not be enough to attain immediate equilibria. The plugin may (or may not) need multiple execution rounds before satisfactory queue equilibrium is attained.

Example Usage

This link illustrates the plugin in full use. An unbalanced 3-node cluster exists, consisting of 21 queues, in which all queue masters reside on a single node: 'rabbit@Ayandas-MacBook-Pro'. The Queue Master Balancer is then executed, and queried for its status information during the process until balancing procedures have completed. To illustrate its results, we generate queue distribution report of the cluster prior balancing the queues, and another after the balancing procedures have completed. The results summary are as follows:

  • Report before Queue Master Balancer is executed:
Ayandas-MacBook-Pro:sbin ayandadube$ ./rabbitmqctl eval 'rabbit_queue_master_balancer:report().'
{ok,[{'rabbit_2@Ayandas-MacBook-Pro',{queues,0}},
     {'rabbit@Ayandas-MacBook-Pro',{queues,21}},
     {'rabbit_1@Ayandas-MacBook-Pro',{queues,0}}]}
  • Report after Queue Master Balancer is executed:
Ayandas-MacBook-Pro:sbin ayandadube$ ./rabbitmqctl eval 'rabbit_queue_master_balancer:report().'
{ok,[{'rabbit_2@Ayandas-MacBook-Pro',{queues,7}},
     {'rabbit@Ayandas-MacBook-Pro',{queues,6}},
     {'rabbit_1@Ayandas-MacBook-Pro',{queues,8}}]}

The resulting distribution of queues across the cluster is near even, a state in which we have attained an acceptable level of queue equilibrium. Computing Queue Equilibrium for the node with the least number of queues, 'rabbit@Ayandas-MacBook-Pro', with 6 queues, yields the following percentage:

inline fit

which satisfies condition/equation [i], being >= 70%.

License and Copyright

(c) Erlang Solutions Ltd. 2017-2019

https://www.erlang-solutions.com/

You can’t perform that action at this time.