-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RabbitMQ OCF RA M/S HA cluster agent migration #1698
Commits on Apr 16, 2015
-
Backward-compatible commit for packaging of fuel-library
based on Change-Id: Ie759857fb94db9aa94aaeaeda2c6ab5bb159cc9e All the work done for fuel-library packaging Should be overriden by the change above after we switch CI to package-based implements blueprint: package-fuel-components Change-Id: I48ed37a009b42f0a9a21cc869a869edb505b39c3
Vladimir Kuklin committedApr 16, 2015 Configuration menu - View commit details
-
Copy full SHA for f5e2cc0 - Browse repository at this point
Copy the full SHA f5e2cc0View commit details
Commits on May 14, 2015
-
All the work done for fuel-library packaging
1) Package fuel library into three different packages: RPM: fuel-library6.1 ALL: fuel-ha-utils, fuel-misc 2) Install packages onto slave nodes implements blueprint: package-fuel-components Change-Id: Ie759857fb94db9aa94aaeaeda2c6ab5bb159cc9e
Vladimir Kuklin committedMay 14, 2015 Configuration menu - View commit details
-
Copy full SHA for ec56073 - Browse repository at this point
Copy the full SHA ec56073View commit details
Commits on May 21, 2015
-
Check hostlist against starting and active resources
This commit makes post-start notify action to check hostlist of nodes that should be joined to the cluster to contain not only nodes that will be started but also ones that are already started. This fixes the case when Pacemaker sends notifies only for the latest event and thus the node which is not included into the start list will not join the cluster. Also it checks whether the node is already clustered and skips the join if it is not needed. Change-Id: Ibe8ecdcfe42c14228350b1eb3c9d08b1a64e117d Closes-bug: #1455761
Vladimir Kuklin committedMay 21, 2015 Configuration menu - View commit details
-
Copy full SHA for bbb3793 - Browse repository at this point
Copy the full SHA bbb3793View commit details -
Check whether beam is started before running start_app
There is a mistake in OCF logic which tries to start rabbitmq app without running beam after Mnesia reset getting into the loop which constantly fails until it times out Change-Id: Id096961e206a083b51978fc5034f99d04715d7ea Related-bug: #1436812
Vladimir Kuklin committedMay 21, 2015 Configuration menu - View commit details
-
Copy full SHA for f97fb5c - Browse repository at this point
Copy the full SHA f97fb5cView commit details
Commits on May 22, 2015
-
Sync rabbit OCF code diverge to packages
W/o this patch, the code in OCF script from deployment/ dir will never get to the fuel-library packages, which are building from files/ and /debian dirs only. The solution is: 1) sync the code diverged to the files/ and debian/ 2) either to remove the source OCF file or to update the way files being linked. This patch fixes only the step 1 as there is not yet decided how to deal with the step 2. Related-bug: #1457441 Related-bug: #184966 Change-Id: Ied86640e8e853de99bcd26f1ae726fc8272b6db7 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedMay 22, 2015 Configuration menu - View commit details
-
Copy full SHA for 8a5d91b - Browse repository at this point
Copy the full SHA 8a5d91bView commit details -
W/o this fix, when rabbit app cannot start due to corrupted mnesia state, the mnesia would be cleaned not completely. This may prevent the rabbit app from start and take the node out of the cluster permanently. The solution is to remove all rabbit node related mnesia files. Closes-bug: #1457766 Change-Id: I680efbf573c22aa9a13d8429d985b5a57235b2bf Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedMay 22, 2015 Configuration menu - View commit details
-
Copy full SHA for bae52d6 - Browse repository at this point
Copy the full SHA bae52d6View commit details
Commits on May 25, 2015
-
Fix rabbit OCF demote/stop/promote actions
* When the rabbit node went down, its status remains 'running' in mnesia db for a while, so few retries (50 sec of total) are required in order to kick and forget this node from the cluster. This also requires +50 sec for actions stop & demote timeout. * The rabbit master score in the CIB is retained after the current master moved manually. This is wrong and the score must be reset ASAP for post-demote and post-stop as well. * The demoted node must be kicked from cluster by other nodes on post-demote processing. * Post-demote should stop the rabbit app at the node being demoted as this node should be kicked from the cluster by other nodes. Instead, it stops the app at the *other* nodes and brings full cluster downtime. * The check to join should be only done at the post-start and not at the post-promote, otherwise the node being promoted may think it is clustered with some node while the join check reports it as already clustered with another one. (the regression was caused by https://review.openstack.org/184671) * Change `hostname` call to `crm_node -n` via $THIS_PCMK_NODE everywhere to ensure we are using correct pacemaker node name * Handle empty values for OCF_RESKEY_CRM_meta_notify_* by reporting the resource as not running. This will rerun resource and restore its state, eventually. Closes-bug: #1436812 Closes-bug: #1455761 Change-Id: Ib01c1731b4f06e6b643a4bca845828f7db507ad3 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedMay 25, 2015 Configuration menu - View commit details
-
Copy full SHA for 7af8c43 - Browse repository at this point
Copy the full SHA 7af8c43View commit details -
Add rabbit OCF functions to get pacemaker node names
W/o this fix, the failover time was longer than expected as rabbit nodes was able to query corosync nodes left the cluster and also try to join them by rabbit cluster ending up being reset and rejoin alive nodes later. 1) Add functions: a) to get all alive nodes in the partition b) to get all nodes This fixes get_monitor behaviour so that it ignores attributes for dead nodes as crm_node behaviour changed with upgrade of pacemaker. So rabbit nodes will never try to join the dead ones. 2) Fix bash scopes for local variables Minor change removing unexcpeted behavior when local variable impacts global scope. Related-bug: #1436812 Change-Id: I89b716b4cd007572bb6832365d4424669921f057 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedMay 25, 2015 Configuration menu - View commit details
-
Copy full SHA for 85dabaa - Browse repository at this point
Copy the full SHA 85dabaaView commit details
Commits on May 27, 2015
-
Check if the rabbitmqctl command is responding
W/o this fix, rabbitmqctl sometimes may hang failing many commands. This is a problem as it brings the rabbit node to unresponsive and broken state. This also may affect entire cluster operations, for example, when the failed command is the forget_cluster_node. The solution is to check for the cases when the command rabbitmqctl list_channels timed out and killed or termintated with exit codes 137 or 124 and return generic error. There is also related confusing error message "get_status() returns generic error" may be logged when the rabbit node is running out of the cluster and fixed as well. Closes-bug: #1459173 Change-Id: Ia52fc5f2ab7adb36252a7194f9209ab87ce487de Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedMay 27, 2015 Configuration menu - View commit details
-
Copy full SHA for b2b60c5 - Browse repository at this point
Copy the full SHA b2b60c5View commit details -
Add second monitor operation to check RabbitMQ
This commit checks whether there is a running cluster of rabbitmq and if rabbitmq app is running on the node and exits with non-zero code if current node is not running rabbitmq, but should do so Change-Id: I2098405b39ade7325b94781aeb997de0937bdf4c Closes-bug: #1458828
Vladimir Kuklin committedMay 27, 2015 Configuration menu - View commit details
-
Copy full SHA for d0f4a4c - Browse repository at this point
Copy the full SHA d0f4a4cView commit details
Commits on Jun 3, 2015
-
Erase mnesia if a rabbit node cannot join the cluster
W/o this fix, the situation is possible when a rabbit node would stuck in a start/stop loop failing to join the cluster with an error: "no_running_cluster_nodes, You cannot leave a cluster if no online nodes are present." This is an issue because the rabbit node should always be able to join the cluster, if it was ordered to start by pacemaker RA. The solution is to force the mnesia reset, if the rabbit node cannot join the cluster on post-start notify. Note, that for the master starting, the node wouldn't be reset. So, the mnesia will be kept intact at least on the resource master. Partial-bug: #1461509 Change-Id: I69bc13266a1dc784681b2677ae5616bfc28cf54f Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJun 3, 2015 Configuration menu - View commit details
-
Copy full SHA for a57312c - Browse repository at this point
Copy the full SHA a57312cView commit details
Commits on Jun 12, 2015
-
Restart rabbit if can't list queues or found memory alert
W/o this fix the dead end situation is possible when the rabbit node have no free memory resources left and the cluster blocks all publishing, by design. But the app thinks "let's wait for the publish block have lifted" and cannot recover. The workaround is to monitor results of crucial rabbitmqctl commands and restart the rabbit node, if queues/channels/alarms cannot be listed or if there are memory alarms found. This is the similar logic as we have for the cases when rabbitmqctl list_channels hangs. But the channels check is also fixed to verify if the exit code>0 when the rabbit app is running. Additional checks added to the monitor also require extending the timeout window for the monitor action from 60 to 180 seconds. Besides that, this patch makes the monitor action to gather the rabbit status and runtime stats, like consumed memory by all queues of total Mem+Swap, total messages in all queues and average queue consumer utilization. This info should help to troubleshoot failures better. DocImpact: ops guide. If any rabbitmq node exceeded its memory threshold the publish became blocked cluster-wide, by design. For such cases, this rabbit node would be recovered from the raised memory alert and immediately stopped to be restarted later by the pacemaker. Otherwise, this blocked publishing state might never have been lifted, if the pressure persists from the OpenStack apps side. Closes-bug: #1463433 Change-Id: I91dec2d30d77b166ff9fe88109f3acdd19ce9ff9 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJun 12, 2015 Configuration menu - View commit details
-
Copy full SHA for 5415505 - Browse repository at this point
Copy the full SHA 5415505View commit details
Commits on Jul 7, 2015
-
W/o this fix, the list of file names not accessible by rabbitmq user will be treated as multiple arguments to the if command causing it to throw the "too many arguments" error and the chown command to be skipped. This is the problem as it might prevent the rabbitmq server from starting because of a bad files ownership. The solution is to pass the list of files as a single argument "${foo}". Closes-bug: #1472175 Change-Id: I1d00ec3f31cd0f023bd58a4e11e5b31659977229 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJul 7, 2015 Configuration menu - View commit details
-
Copy full SHA for 4a7a8e0 - Browse repository at this point
Copy the full SHA 4a7a8e0View commit details -
Fix error return codes for rabbit OCF
W/o this fix the situation is possible when rabbit OCF returns OCF_NOT_RUNNING in the hope of future restart of the resource by pacemaker. But in fact, pacemaker will not trigger restart action if monitor returns "not running". This is an issue as we want resource restarted. The solution is to return OCF_ERR_GENERIC instead of OCF_NOT_RUNNING when we expect the resource to be restarted (which is action stop plus action start). Closes-bug: #1472230 Change-Id: I10c6e43d92cb23596636d86932674b36864d1595 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJul 7, 2015 Configuration menu - View commit details
-
Copy full SHA for 64e2098 - Browse repository at this point
Copy the full SHA 64e2098View commit details
Commits on Jul 9, 2015
-
Configuration menu - View commit details
-
Copy full SHA for c87ebae - Browse repository at this point
Copy the full SHA c87ebaeView commit details
Commits on Jul 13, 2015
-
Configuration menu - View commit details
-
Copy full SHA for f9a87be - Browse repository at this point
Copy the full SHA f9a87beView commit details
Commits on Jul 21, 2015
-
Implement the dumping of rabbitMQ definitions
This changes leverages the rabbitmq management plugin to dump exchanges, queues, bindings, users, virtual hosts, permissions and parameters from the running system. Specifically this change adds the following: * The dumping rabbitMQ definitions (users/vhosts/exchanges/etc) during the end of the deployment * The possibility to restore definitions to the rabbitmq-server ocf script during rabbitMQ startup. * Enabled rabbitmq admin plugin, but restricts it to localhost traffic. This reverts Ic01c26200f6019a8112b1c5fb04a282e64b3b3e6 but adds firewall rules to mitigate the issue. DocImpact: The dump_rabbit_definitions task can be used to backup the rabbitmq definitions and if custom definitions (users/vhosts/etc) are created it must be run or the changes may be lost during the rabbitmq failover via pacemaker. Change-Id: I715f7c2ae527f7e105b9f6b7d82c443e8accf178 Closes-bug: #1383258 Related-bug: #1450443 Co-Authored-By: Alex Schultz <aschultz@mirantis.com>
Configuration menu - View commit details
-
Copy full SHA for a9d8664 - Browse repository at this point
Copy the full SHA a9d8664View commit details
Commits on Aug 13, 2015
-
Fix rabbitmq data restore for large datasets
Previously we were sending the json backup data on the command line which fails when the dataset is large. This change updates the command line options for curl to pass the filename directly and let it handle the reading of the data. Change-Id: I37f298279beca06df41fb08e1745602976c6a776 Closes-Bug: 1383258
Alex Schultz committedAug 13, 2015 Configuration menu - View commit details
-
Copy full SHA for 44c24cd - Browse repository at this point
Copy the full SHA 44c24cdView commit details
Commits on Aug 27, 2015
-
Add more logs to rabbitmq get_status function
It's really hard to debug, when get_status() returns $OCF_NOT_RUNNING only and looses exit code and error output. Added more logs to avoid of this situation. Related-Bug: #1488999 Change-Id: Id0999235d7be688f55799e2952fe22e97b678ce7
Configuration menu - View commit details
-
Copy full SHA for fb89b78 - Browse repository at this point
Copy the full SHA fb89b78View commit details
Commits on Sep 3, 2015
-
Detect a last man standing for rabbit OCF agent
W/o this patch, the race condition is possible when there is no running rabbit nodes/resource master. The rabbit nodes will start/stop in an endless loop as a result introducing full downtime for AMQP cluster and cloud control plane. The solution is: * On post-start/post-promote notify, do nothing, if either of the following is a true: - there is no rabbit resources running or no master - the list of rabbit resources being started/promoted reported empty * For such cases, do not report resource failure and delegate recovery, if needed, to the "running out of the cluster" monitor's logic. * Additionally, report about a last man standing when there is no running rabbit resources around. Closes-bug: #1491306 Change-Id: If1c62fac26b63410636413c49fce55c35e53dc5f Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedSep 3, 2015 Configuration menu - View commit details
-
Copy full SHA for f72a006 - Browse repository at this point
Copy the full SHA f72a006View commit details
Commits on Sep 4, 2015
-
Make RabbitMQ OCF script tolerate rabbitmqctl timeouts
The change makes OCF script ignore small number of timeouts of rabbitmqctl for 'heavy' operations: list_channels, get_alarms and list_queues. Number of tolerated timeouts in a row is configured through a new variable 'max_rabbitmqctl_timeouts'. By default it is set to 1, i.e. rabbitmqctl timeouts are not tolerated at all. Bug #1487517 is fixed by extracting declaration of local variables 'rc_alarms' and 'rc_queues' from assignment operations. Text for Operations Guide: If on node where RabbitMQ is deployed other processes consume significant part of CPU, RabbitMQ starts responding slow to queries by 'rabbitmqctl' utility. The utility is used by RabbitMQ's OCF script to monitor state of the RabbitMQ. When utility fails to return in pre-defined timeout, OCF script considers RabbitMQ to be down and restarts it, which might lead to a limited (several minutes) OpenStack downtime. Such restarts are undesirable as they cause downtime without benefit. To mitigate the issue, the OCF script might be told to tolerate certain amount of rabbitmqctl timeouts in a row using the following command: crm_resource --resource p_rabbitmq-server --set-parameter \ max_rabbitmqctl_timeouts --parameter-value N Here N should be replaced with the number of timeouts. For instance, if it is set to 3, the OCF script will tolerate two rabbitmqctl timeouts in a row, but fail if the third one occurs. By default the parameter is set to 1, i.e. rabbitmqctl timeout is not tolerated at all. The downside of increasing the parameter is that if a real issue occurs which causes rabbitmqctl timeout, OCF script will detect that only after N monitor runs and so the restart, which might fix the issue, will be delayed. To understand that RabbitMQ's restart was caused by rabbitmqctl timeout you should examine lrmd.log of the corresponding controller on Fuel master node in /var/log/docker-logs/remote/ directory. Here lines like "the invoked command exited 137: /usr/sbin/rabbitmqctl list_channels ..." indicate rabbitmqctl timeout. The next line will explain if it caused restart or not. For example: "rabbitmqctl timed out 2 of max. 3 time(s) in a row. Doing nothing for now." DocImpact: user-guide, operations-guide Closes-Bug: #1479815 Closes-Bug: #1487517 Change-Id: I9dec06fc08dbeefbc67249b9e9633c8aab5e09ca
Configuration menu - View commit details
-
Copy full SHA for 8a2afcb - Browse repository at this point
Copy the full SHA 8a2afcbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 96d0a34 - Browse repository at this point
Copy the full SHA 96d0a34View commit details
Commits on Sep 15, 2015
-
Return NOT_RUNNING when beam is not RUNNING
Change get_status to return NOT_RUNNING when beam is not_running. Otherwise, pacemaker will get stuck during rabbitmq failover and will not attempt to restart the failed resource Change-Id: I926a3eafa9968abdf07baa5f2d5c22480300fb30 Closes-bug: #1484280
Vladimir Kuklin committedSep 15, 2015 Configuration menu - View commit details
-
Copy full SHA for 4f15f6b - Browse repository at this point
Copy the full SHA 4f15f6bView commit details
Commits on Sep 22, 2015
-
On notify, if we detect that we are a part of a cluster we still need to start the RabbitMQ application, because it is always down after action_start finishes. Closes-Bug: #1496386 Change-Id: I307452b687a6100cc4489c8decebbc3dccdbc432
Configuration menu - View commit details
-
Copy full SHA for 64285b3 - Browse repository at this point
Copy the full SHA 64285b3View commit details
Commits on Oct 9, 2015
-
Avoid division operation in shell
When the data returned from 'rabbitmqctl list_queues' grows a lot and awk sums up all the rows especially for memory calculation it returns the sum in scientific notation (example from bug was .15997e+09), later when we want to calculate the memory in MB instead of bytes, the bash division does not like this string. We can just avoid the situation by doing the division into MB in awk itself. Since we don't need the memory in bytes anyway. Closes-Bug: #1503331 Change-Id: I38d25406b84d0f70ed62101d5fb5ba108bcab8bd
Configuration menu - View commit details
-
Copy full SHA for 3b4a81c - Browse repository at this point
Copy the full SHA 3b4a81cView commit details -
Wait for rabbitmq sync before stop/demote actions
Added new OCF key stop_time (corresponding to start_time) Added wait_sync function which tries until start_time/2 for queues on stopped/demoted node to reach synced state. Added optional [-t timeout] to su_rabbit_cmd function to provide arbitrary timeout Change-Id: Iae2211b3d477a9603a58d5eacb12e0fba924861a Closes-Bug: #1464637
Configuration menu - View commit details
-
Copy full SHA for 806cfd2 - Browse repository at this point
Copy the full SHA 806cfd2View commit details
Commits on Oct 12, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 2b3f58e - Browse repository at this point
Copy the full SHA 2b3f58eView commit details
Commits on Oct 16, 2015
-
Sync rabbitmq OCF from upstream
Sync upstream changes back to Fuel downstream Source https://github.com/rabbitmq/rabbitmq-server version stable/fedfefebaa39a0aeb41cf9328ba44c3a458e4614 Related blueprint upstream-rabbit-ocf Closes-bug: #1473015 Change-Id: Ie19c2f071c53b873a359c6c5134e9498c6391e66 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedOct 16, 2015 Configuration menu - View commit details
-
Copy full SHA for da604f9 - Browse repository at this point
Copy the full SHA da604f9View commit details
Commits on Oct 20, 2015
-
Packages are now "self-hosted": no need for the packaging dir
... in the source distribution anymore
Configuration menu - View commit details
-
Copy full SHA for a796ec8 - Browse repository at this point
Copy the full SHA a796ec8View commit details
Commits on Oct 21, 2015
-
Fix the timeout arg for the su_rabbit_cmd
And fix local bashisms as a little bonus Upstream patch rabbitmq/rabbitmq-server#374 Related-bug: #1464637 Change-Id: I13189de9f8abce23673c031d11132e495e1972e3 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedOct 21, 2015 Configuration menu - View commit details
-
Copy full SHA for 27a0454 - Browse repository at this point
Copy the full SHA 27a0454View commit details
Commits on Oct 22, 2015
-
Fix piped exit codes expectations and count processing
* Fix return code of the get_all_pacemaker_nodes() and get_alive_pacemaker_nodes_but() to be not provided as ignored anyway. * Fix return code expectation of the fetched count attribute in the check_timeouts(). Upstream patch rabbitmq/rabbitmq-server#374 Closes-bug: #1506440 Change-Id: I44a6cff2ccba1ba53a18da90c9d74cbb6084ca0c Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedOct 22, 2015 Configuration menu - View commit details
-
Copy full SHA for 8ca9174 - Browse repository at this point
Copy the full SHA 8ca9174View commit details
Commits on Oct 23, 2015
-
Configuration menu - View commit details
-
Copy full SHA for b867ae0 - Browse repository at this point
Copy the full SHA b867ae0View commit details
Commits on Oct 26, 2015
-
Configuration menu - View commit details
-
Copy full SHA for f0ff141 - Browse repository at this point
Copy the full SHA f0ff141View commit details
Commits on Nov 5, 2015
-
Don't update .erlang.cookie on every run
Update happens even during no-op commands like 'meta-data' or 'usage'. During this update there is a short window for a race condition: a shell redirection truncates the cookie file, and echo writes data there only after a brief period of time. So erlang may read data from this empty file and die with error "Too short cookie string". Change-Id: I4c3201617669f3872145048b77337632cb93558c Closes-Bug: #1512754
Configuration menu - View commit details
-
Copy full SHA for d048c74 - Browse repository at this point
Copy the full SHA d048c74View commit details
Commits on Nov 9, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 05f33de - Browse repository at this point
Copy the full SHA 05f33deView commit details -
Don't update cookie on every run of HA OCF script
Update happens even during no-op commands like 'meta-data' or 'usage'. During this update there is a short window for a race condition: a shell redirection truncates the cookie file, and echo writes data there only after a brief period of time. So erlang may read data from this empty file and die with the error "Too short cookie string".
Configuration menu - View commit details
-
Copy full SHA for 38afe77 - Browse repository at this point
Copy the full SHA 38afe77View commit details -
Merge pull request ClusterLabs#411 from binarin/rabbitmq-server-ocf-i…
…dempotent-cookie Don't update cookie on every run of HA OCF script
Configuration menu - View commit details
-
Copy full SHA for 6d8c983 - Browse repository at this point
Copy the full SHA 6d8c983View commit details
Commits on Nov 12, 2015
-
Bind rabbitmq, epmd, and management plugin to internal IP
RabbitMQ itself was already listening on the correct IP for controllers, but epmd and management plugin listened everywhere (although management was covered by firewall rules). This covers all RabbitMQ server connection binding so that all connections are done on the same IP address (with the unfortunate side effect of blocking localhost connections). Removed unused parameter rabbitmq_host from nailgun::rabbitmq. Change-Id: I9bfb8bc85fcd6d4711c4ca9d79745ad2ce7e673a Closes-Bug: #1501731
Configuration menu - View commit details
-
Copy full SHA for 0432228 - Browse repository at this point
Copy the full SHA 0432228View commit details
Commits on Nov 16, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 937890f - Browse repository at this point
Copy the full SHA 937890fView commit details -
Working with RMQ definitions via management plugin requires knowing the IP address where it listens. host_ip parameter will default to 127.0.0.1, but is configurable.
Configuration menu - View commit details
-
Copy full SHA for 222ffcd - Browse repository at this point
Copy the full SHA 222ffcdView commit details
Commits on Dec 9, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 9777d8e - Browse repository at this point
Copy the full SHA 9777d8eView commit details
Commits on Dec 11, 2015
-
Add ability to disable HA for RabbitMQ queues
Add two flags: * enable_rpc_ha which enables queue mirroring for RPC queues * enable_notifications_ha which enables queue mirroring for Ceilometer queues Since the feature is experimental, both flags are set to true by default to preserve current behaviour. The change is implemented in several steps: * the upstream script changed so that it allows to extend the list of parameters and uses a policy file to define RabbitMQ policies. * we add our own version of OCF script which wraps around the upstream one. It defines a new enable_rpc_ha and enable_notifications_ha parameter and passes their value to the upstream script. * we add our policy file, where we use the introduced parameters to decide which policies we should set. So we will have two OCF scripts for RabbitMQ in our deployment: * rabbitmq-server-upstream - the upstream version * rabbitmq-server - our extention, which will be used in the environment The upstream version of the script is pushed to the upstream along with empty policy file, so that other users can define their own policies or extend the script if needed. Here are the corresponding pull requests: rabbitmq/rabbitmq-server#480 rabbitmq/rabbitmq-server#482 (both are already merged) Text for Operations Guide It is possible to significantly reduce load which OpenStack puts on RabbitMQ by disabling queue mirroring. This could be done separately for RPC queues and Ceilometer ones. To disable mirroring for RPC queues, execute the following command on one of the controllers: crm_resource --resource p_rabbitmq-server --set-parameter \ enable_rpc_ha --parameter-value false To disable mirroring for Ceilometer queues, execute the following command on one of the controllers: crm_resource --resource p_rabbitmq-server --set-parameter \ enable_notifications_ha --parameter-value false In order for any of the changes to take effect, RabbitMQ service should be restarted. To do that, first execute pcs resource disable master_p_rabbitmq-server Then monitor RabbitMQ state using command pcs resource until it shows that all RabbitMQ nodes are stopped. Once they are, execute the following command to start RabbitMQ: pcs resource enable master_p_rabbitmq-server Beware: during restart all messages accumulated in RabbitMQ will be lost. Also, OpenStack will stop functioning until RabbitMQ is up again, so plan accordingly. Note that it is not yet well tested how this configuration affects failover when some cluster nodes go down. Hence it is experimental, use at your own risk! DocImpact: ops-guide Implements: blueprint rabbitmq-disable-mirroring-for-rpc Change-Id: I80ae231ca64e2a903b0968d36ba0e85ca9cc9891
Configuration menu - View commit details
-
Copy full SHA for 129dbce - Browse repository at this point
Copy the full SHA 129dbceView commit details
Commits on Dec 14, 2015
-
Configuration menu - View commit details
-
Copy full SHA for 2d05408 - Browse repository at this point
Copy the full SHA 2d05408View commit details -
Fix default value for 'use_fqdn' in meta_data
This change fixes the copy-paste gone wrong and pulls in the rabbitmq upstream commit of c85fdd0f5c54f312fc2147dad2b956961aae3f12. Closes-Bug: #1526062 Change-Id: I49e45cd893af8c65ed5ddd3efb834e38737a69a2
Configuration menu - View commit details
-
Copy full SHA for 2a52905 - Browse repository at this point
Copy the full SHA 2a52905View commit details
Commits on Dec 30, 2015
-
Fix stop conditions for the rabbit OCF resource
* Fix the get_status() unexpectedly reports generic error instead of "not running" * Add proc_stop and proc_kill functions (TODO these shall go as external common ocf heplers, eventually) * Rework stop_server_process() - make it to return SUCCESS/ERROR as expected - grant the "rabbitmqctl stop" a graceful termintation window and only then ensure the beam process termination and pidfile removal as well - return the actual status with get_status() * Rework kill_rmq_and_remove_pid() - use proc_stop to try to kill by pgrp with -TERM, then -KILL, or by the beam process name match, if there is no PID. - make it to returns SUCCESS/ERROR * Fix action_stop() - fail early by the stop_server_process() results without additional rabbitmqctl invocations in the get_status() call - rework hard-coded sleep 10 to use the gracefull stop windows in the stop_server_process() instead - ensure the rabbit-start-time removal from CIB before to try to stop the server process - issue the "stop: action end" log record before the actual end * Add comments and make logs to be more informational Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1529897 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> Co-authored-by: Alex Schultz <aschultz@mirantis.com>
Bogdan Dobrelya and Alex Schultz committedDec 30, 2015 Configuration menu - View commit details
-
Copy full SHA for df33e89 - Browse repository at this point
Copy the full SHA df33e89View commit details
Commits on Dec 31, 2015
-
Ensure rabbit node uptime is reset in the CIB for OCF resource
* Add ocf_run wrappers and info log messages for CIB attribute events * Move "fast" CIB attribute updates before "heavy" operations like start/stop/wait to ensure CIB consistent even if the timeouts exceeded for the ops * Delete master and start time attributes from CIB on action_start to ensure the correct rabbit nodes uptime evaluation for new master elections for corresponding pacemaker resources * For post-demote notify and action_demote() delete the master attribute from CIB as well. * For post-start notify, update the start time in the CIB even when the node is already clustered. Otherwise it would remain running in cluster w/o the start time registered, which affects the new master elections badly. * fix wrong log message when joining by a node Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1530150 https://bugs.launchpad.net/fuel/+bug/1530296 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedDec 31, 2015 Configuration menu - View commit details
-
Copy full SHA for 48d7106 - Browse repository at this point
Copy the full SHA 48d7106View commit details -
Fix rabbit OCF log message when joining by a node
Closes-bug: #1530296 Change-Id: Id2258da4f272dc8eca92130d45ecb69a16ed7c35 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedDec 31, 2015 Configuration menu - View commit details
-
Copy full SHA for 6382b99 - Browse repository at this point
Copy the full SHA 6382b99View commit details
Commits on Jan 7, 2016
-
Remove unneeded sleep for a graceful stop by PID
The sleep in not needed according to the https://www.rabbitmq.com/man/rabbitmqctl.1.man.html "If a pid_file is specified, also waits for the process specified there to terminate." Related Fuel bug https://launchpad.net/bugs/1529897 Related PR rabbitmq/rabbitmq-server#523 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJan 7, 2016 Configuration menu - View commit details
-
Copy full SHA for b2cea03 - Browse repository at this point
Copy the full SHA b2cea03View commit details
Commits on Jan 11, 2016
-
Syntax and local vars usage fixes to OCF HA
Related Fuel bug: https://launchpad.net/bugs/1529897 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJan 11, 2016 Configuration menu - View commit details
-
Copy full SHA for e4255b2 - Browse repository at this point
Copy the full SHA e4255b2View commit details -
Fix proc_kill then there is no pid found
W/o this fix, the rabbit OCF cannot make proc_stop to try to kill the pid-less beam process by its name matching because the proc_kill()'s 1st parameter cannot be passed empty. The fix is to use the "none" value then the pid-less process must be matched by the service_name instead. Also, fix the proc_kill to deal with Multi process pid files as well (there are many pids, a space separated). Related Fuel bugs: https://launchpad.net/bugs/1529897 https://launchpad.net/bugs/1532723 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJan 11, 2016 Configuration menu - View commit details
-
Copy full SHA for 051514a - Browse repository at this point
Copy the full SHA 051514aView commit details -
Fix get_status, action_stop, proc_stop then beam's unresponsive
* Fix get status() to catch beam state and output errors * Fix action_stop() to force name-based mathcing then no pidfile and the beam's unresponsive * Fix proc_stop to use name based matching if no pidfile found * Fix proc_stop to retry sending the signal when using the name based match as well W/o this patch, the situation is possible when: - beam's running and cannot process signals, but is reported "not running" by the get_status(), while in fact it shall be reported as generic error - which_applications() returned error, while its output is still being parsed for the "what" match, while it shall not. - action stop and proc_stop gives up then there is no pidfile and the beam's running unresponsive. The solution is to make get_status to return generic error and action stop to use the rabbit process name matching for killing it. Related Fuel bug: https://bugs.launchpad.net/fuel/+bug/1529897 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJan 11, 2016 Configuration menu - View commit details
-
Copy full SHA for 0f975e5 - Browse repository at this point
Copy the full SHA 0f975e5View commit details -
Fix monitor/stop operations for the rabbit OCF resource
W/o this fix, the situation is possible when: - beam's running and cannot process signals, but is reported "not running" by the get_status(), while in fact it shall be reported as generic error - which_applications() returned error, while its output is still being parsed for the "what" match, while it shall not. - action stop and proc_stop gives up then there is no pidfile and the beam's running unresponsive. The solution is to make get_status to return generic error and action stop to use the rabbit process name matching for killing it. These and other related fixes listed below (tl;dr) * Fix get_status, action_stop, proc_stop then beam's unresponsive (ie. fails to process signals or does it very slowly) - Fix get status() to catch beam state and output errors - Fix action_stop() to force name-based mathcing then no pidfile and the beam's unresponsive - Fix proc_stop to use name based matching if no pidfile found - Fix proc_stop to retry sending the signal when using the name based match as well * Fix the get_status() unexpectedly reports generic error instead of "not running" * Add reworked proc_stop and proc_kill functions from the ocf-fuel-funcs * Rework stop_server_process() - make it to return SUCCESS/ERROR as expected - grant the "rabbitmqctl stop" a graceful termintation window and only then ensure the beam process termination and pidfile removal as well - return the actual status with get_status() * Rework kill_rmq_and_remove_pid() - use proc_stop to try to kill by pgrp with -TERM, then -KILL, or by the beam process name match, if there is no PID. - make it to returns SUCCESS/ERROR * Fix action_stop() - fail early by the stop_server_process() results without additional rabbitmqctl invocations in the get_status() call - rework hard-coded sleep 10 to use the gracefull stop windows in the stop_server_process() instead - ensure the rabbit-start-time removal from CIB before to try to stop the server process - issue the "stop: action end" log record before the actual end * Add comments, adjust logs levels and make them to be more informational Upstream PRs rabbitmq/rabbitmq-server#523 rabbitmq/rabbitmq-server#532 rabbitmq/rabbitmq-server#538 rabbitmq/rabbitmq-server#540 Closes-bug: #1529897 Change-Id: I1c382e3cf004630847b6626fabaecaa0094ee271 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJan 11, 2016 Configuration menu - View commit details
-
Copy full SHA for 40fc6d9 - Browse repository at this point
Copy the full SHA 40fc6d9View commit details
Commits on Jan 12, 2016
-
Ensure rabbit node uptime is reset in the CIB for OCF resource
* Add ocf_run wrappers and info log messages for CIB attribute events * Move "fast" CIB attribute updates before "heavy" operations like start/stop/wait to ensure CIB consistent even if the timeouts exceeded for the ops * Delete master and start time attributes from CIB on action_start to ensure the correct rabbit nodes uptime evaluation for new master elections for corresponding pacemaker resources * For post-demote notify and action_demote() delete the master attribute from CIB as well. * For post-start notify, update the start time in the CIB even when the node is already clustered. Otherwise it would remain running in cluster w/o the start time registered, which affects the new master elections badly. Upstream RR rabbitmq/rabbitmq-server#524 Closes-bug: #1530150 Change-Id: I9db3c819031cef620377b4fee08ea92e90b11c70 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJan 12, 2016 Configuration menu - View commit details
-
Copy full SHA for 8c4d847 - Browse repository at this point
Copy the full SHA 8c4d847View commit details
Commits on Jan 14, 2016
-
Fix rabbitMQ OCF monitor detection of running master
When monitor detected the node as OCF_RUNNING_MASTER, this may be lost while the monitor checks in progress. * Rework the prev_rc by the rc_check to fix this. * Also add info log if detected as running master. * Break the monitor check loop early, if it shall be exiting to be restarted by pacemaker. * Do not recheck the master status and do not update the master score, if the node was already detected by monitor as OCF_RUNNING_MASTER. By that point, the running and healthy master shall not be checked against other nodes uptime as it is pointless and only takes more time and resources for the action monitor to finish. * Fail early, if monitor detected the node as OCF_RUNNING_MASTER, but the rabbit beam process is not running * For OCF_CHECK_LEVEL>20, exclude the current node from the check loop as we already checked it before Closes-bug: #1531838 Change-Id: I319db307c73ef24d829be44eeb63d1f52f4180fa Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJan 14, 2016 Configuration menu - View commit details
-
Copy full SHA for a6bcc9a - Browse repository at this point
Copy the full SHA a6bcc9aView commit details -
Fix rabbitMQ OCF monitor detection of running master
When monitor detected the node as OCF_RUNNING_MASTER, this may be lost while the monitor checks in progress. * Rework the prev_rc by the rc_check to fix this. * Also add info log if detected as running master. * Break the monitor check loop early, if it shall be exiting to be restarted by pacemaker. * Do not recheck the master status and do not update the master score, if the node was already detected by monitor as OCF_RUNNING_MASTER. By that point, the running and healthy master shall not be checked against other nodes uptime as it is pointless and only takes more time and resources for the action monitor to finish. * Fail early, if monitor detected the node as OCF_RUNNING_MASTER, but the rabbit beam process is not running * For OCF_CHECK_LEVEL>20, exclude the current node from the check loop as we already checked it before Related Fuel bug: https://launchpad.net/bugs/1531838 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedJan 14, 2016 Configuration menu - View commit details
-
Copy full SHA for b7815e4 - Browse repository at this point
Copy the full SHA b7815e4View commit details -
Introduce node name prefix for mgmt/messaging IPs
RabbitMQ will resolve <prefix>-<fqdn> hostnames to valid mgmt/messaging IP Change-Id: Ifc2af16b08663655d365587ea6f45c87bfc68698 Depends-On: I9813fa8c20d47e0ef1e251fe5ac8d01d08fe7703 Closes-bug: #1528707
Configuration menu - View commit details
-
Copy full SHA for c4c4ae7 - Browse repository at this point
Copy the full SHA c4c4ae7View commit details
Commits on Jan 18, 2016
-
Add optional prefix for RabbitMQ node FQDNs
It would allow to instantiate multiple rabbit clusters constructed from prefix-based instances of rabbit nodes.
Configuration menu - View commit details
-
Copy full SHA for 5a9d7ce - Browse repository at this point
Copy the full SHA 5a9d7ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for a88ec6c - Browse repository at this point
Copy the full SHA a88ec6cView commit details
Commits on Jan 19, 2016
-
Reset master score if we decide to restart RabbitMQ on timeout
Doing otherwise might not trigger the restart while it is clearly needed.
Configuration menu - View commit details
-
Copy full SHA for e47d4a9 - Browse repository at this point
Copy the full SHA e47d4a9View commit details -
Reset master score if we decide to restart RabbitMQ on timeout
Doing otherwise might not trigger the restart while it is clearly needed. Upstream PR: rabbitmq/rabbitmq-server#560 Change-Id: I480ebaddc98fa0784098efbf0c5ab8c512c8661d Closes-Bug: #1513421
Configuration menu - View commit details
-
Copy full SHA for f958037 - Browse repository at this point
Copy the full SHA f958037View commit details
Commits on Jan 20, 2016
-
Improve rabbitmq OCF script diagnostics
Currently time-out when running 'rabbitmqctl list_channels' is treated as a sign that current node is unhealthy. But it could not be the case, as the hanging channel could be actually on some other node. Given that currently we have more than one bug related to 'list_channels', it makes sense to improve diagnostics here. This patch doesn't change any behaviour, only improves logging after time-out happens. If time-outs continue to occur (even with latest rabbitmq versions or with backported fixes), we could switch to this improved list_channels and kill rabbitmq only if stuck channels are located on current node. But I hope that all related rabbitmq bugs were already closed.
Configuration menu - View commit details
-
Copy full SHA for c0c6480 - Browse repository at this point
Copy the full SHA c0c6480View commit details -
Improve 'list_channels' diagnostics in OCF
timeout(1) manpage mentions 124 as another valid return code from, in addition to 128 + signal-number.
Configuration menu - View commit details
-
Copy full SHA for f79c7a6 - Browse repository at this point
Copy the full SHA f79c7a6View commit details
Commits on Jan 21, 2016
-
Merge pull request ClusterLabs#563 from binarin/rabbitmq-server-ocf-l…
…ist-channels-diagnostics Improve OCF script diagnostics for timed-out 'list_channels'
Configuration menu - View commit details
-
Copy full SHA for 4440c79 - Browse repository at this point
Copy the full SHA 4440c79View commit details -
Configuration menu - View commit details
-
Copy full SHA for aaade82 - Browse repository at this point
Copy the full SHA aaade82View commit details -
Fix uninitialized variable in rabbitmq script
Upstream: rabbitmq/rabbitmq-server#571 Shell was sometimes complaining at line 1447 due to empty `rc_check` Change-Id: I9411fbc41f8ebf6ac41504ff7456ee7952485564 Partial-Bug: #1531838
Configuration menu - View commit details
-
Copy full SHA for 505f048 - Browse repository at this point
Copy the full SHA 505f048View commit details
Commits on Jan 25, 2016
-
Improve OCF script diagnostics for timed-out 'list_channels'
Upstream PR: rabbitmq/rabbitmq-server#563 Currently time-out when running 'rabbitmqctl list_channels' is treated as a sign that current node is unhealthy. But it could not be the case, as the hanging channel could be actually on some other node. Given that currently we have seen more than one bug related to 'list_channels', it makes sense to improve diagnostics here. This patch doesn't change any behaviour, only improves logging after time-out happens. If time-outs continue to occur (even with latest rabbitmq versions or with backported fixes), we could switch to this improved list_channels and kill rabbitmq only if stuck channels are located on current node. But I hope that all related rabbitmq bugs were already closed. Change-Id: I4746d3a4e85dc2a51af581034ae09a1cf0eefce2 Partial-Bug: #1515223 Partial-Bug: #1513511
Configuration menu - View commit details
-
Copy full SHA for ffe2ad4 - Browse repository at this point
Copy the full SHA ffe2ad4View commit details
Commits on Jan 26, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 0cc3bb6 - Browse repository at this point
Copy the full SHA 0cc3bb6View commit details
Commits on Feb 2, 2016
-
Suppress curl progress indicator in rabbit OCF
curl is used by OCF script for fetching definitions (queues etc.), but results of that invocation is shown as garbage in pacemaker logs - progress indicator doesn't make any sense in logs. According to curl manpage the following combination of options should be used "--silent --show-error" - this will suppress only progress indicator, errors will still be shown. Also other short curl options are replaced with their long counterparts - for improved readability.
Configuration menu - View commit details
-
Copy full SHA for afc03f6 - Browse repository at this point
Copy the full SHA afc03f6View commit details -
Fix uninitialized status_master
Fix multiple nodes may be reported in logs as the running master Related Fuel bug https://bugs.launchpad.net/bugs/1540936 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedFeb 2, 2016 Configuration menu - View commit details
-
Copy full SHA for f1a4c73 - Browse repository at this point
Copy the full SHA f1a4c73View commit details -
Fix cluster membership check for running master
The running master is always inside of its own cluster. Fix the cluster membership check when a node is the master.
Configuration menu - View commit details
-
Copy full SHA for 2078fa9 - Browse repository at this point
Copy the full SHA 2078fa9View commit details
Commits on Feb 3, 2016
-
Fix uninitialized status_master
Fix multiple nodes may be reported in logs as the running master Closes-bug: #1540936 Change-Id: Ic2dfe7b2ba657b9bf06d97f49ddb4b69f2f4e063 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Configuration menu - View commit details
-
Copy full SHA for 37cc8b4 - Browse repository at this point
Copy the full SHA 37cc8b4View commit details -
Streamline checking for cluster partitioning
Move check if we are current cluster master to earlier place in code. That way we will avoid unnecessary operations for master case.
Configuration menu - View commit details
-
Copy full SHA for 6183b23 - Browse repository at this point
Copy the full SHA 6183b23View commit details
Commits on Feb 4, 2016
-
Fix action_stop for the rabbit OCF
The action_stop may sometimes stop the rabbitmq-server gracefully by the PID, but leave unresponsive beam.smp processes running and spoiling rabbits. Those shall be stopped as well. The solution is: - make proc_stop() to accept a pid=none to use a name matching instead - make kill_rmq_and_remove_pid() to stop by the beam process matching as well - fix stop_server_process() to ensure there is no beam process left running Closes-bug: #1541029 Change-Id: Ib9669d15bb714be8a88fd65d7f1815173da788d3 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedFeb 4, 2016 Configuration menu - View commit details
-
Copy full SHA for 9b7d9c5 - Browse repository at this point
Copy the full SHA 9b7d9c5View commit details -
Fix action_stop for the rabbit OCF
The action_stop may sometimes stop the rabbitmq-server gracefully by the PID, but leave unresponsive beam.smp processes running and spoiling rabbits. Those shall be stopped as well. The solution is: - make proc_stop() to accept a pid=none to use a name matching instead - make kill_rmq_and_remove_pid() to stop by the beam process matching as well - fix stop_server_process() to ensure there is no beam process left running Related Fuel bug: https://launchpad.net/bugs/1541029 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedFeb 4, 2016 Configuration menu - View commit details
-
Copy full SHA for 19e931a - Browse repository at this point
Copy the full SHA 19e931aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 77fdb12 - Browse repository at this point
Copy the full SHA 77fdb12View commit details
Commits on Feb 10, 2016
-
Do not check cluster health if master is not elected
Doing otherwise causes node to restart when get_monitor is called within action_promote - it does not find a master and assumes that it is running out of cluster. Also, code is refactored a little bit - a new function returning current master is created and is used in the changed code. Closes-Bug: #1543154 Change-Id: If14fcfc915d76c9580be0a097b250d79cf953b9e
Configuration menu - View commit details
-
Copy full SHA for f5ed86e - Browse repository at this point
Copy the full SHA f5ed86eView commit details -
Exit waiting loop once node has unjoined
Without the break we always wait for 50 seconds, even if we don't need to wait at all. Change-Id: Ib361fbac714d61056f4b9d71f23bb74af33abf77
Configuration menu - View commit details
-
Copy full SHA for 95a2b63 - Browse repository at this point
Copy the full SHA 95a2b63View commit details -
On neighbor promotion do nothing if we are already clustered
+ extracted function checking if we are in the same cluster with given node + made post-promote ignore promotion of self. Previously it was done inside jjj_join, but now we need to do that before the new check. + now we write "post-promote end" log entry at the very end of post-promote, not somewhere in the middle. Closes-Bug: #1544036 Change-Id: Id28d6c94abe5d96452f7ecba2b3fe022f40afa0d
Configuration menu - View commit details
-
Copy full SHA for 0e0feb6 - Browse repository at this point
Copy the full SHA 0e0feb6View commit details
Commits on Feb 16, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 3f88cd2 - Browse repository at this point
Copy the full SHA 3f88cd2View commit details -
Exit waiting loop once node has unjoined
Without the break we always wait for 50 seconds, even if we don't need to wait at all.
Configuration menu - View commit details
-
Copy full SHA for e99b09a - Browse repository at this point
Copy the full SHA e99b09aView commit details
Commits on Feb 19, 2016
-
Private attributes usage in rabbitmq script
There are three types of rabbitmq attributes for pacemaker nodes: -'rabbit-master' -'rabbit-start-time' - timeouts: -'rabbit_list_channels_timeouts' -'rabbit_get_alarms_timeouts' -'rabbit_list_queues_timeouts' Attributes with names 'rabbit-master' and 'rabbit-start-time' should be public because we monitor this attributes in cycle for all nodes in our script. All timeouts attributes were changed to private to avoid unnecessary transitions. Also, --lifetime and --node options were removed for attrd_updater as 'lifetime' for this command is always 'reboot' and 'node' default value is local one.
Configuration menu - View commit details
-
Copy full SHA for d0e7389 - Browse repository at this point
Copy the full SHA d0e7389View commit details -
Merge pull request ClusterLabs#639 from lefremova/stable
Private attributes usage in rabbitmq script
Configuration menu - View commit details
-
Copy full SHA for 277a1d4 - Browse repository at this point
Copy the full SHA 277a1d4View commit details
Commits on Feb 24, 2016
-
Private attributes usage in rabbitmq script
There are three types of rabbitmq attributes for pacemaker nodes: -'rabbit-master' -'rabbit-start-time' - timeouts: -'rabbit_list_channels_timeouts' -'rabbit_get_alarms_timeouts' -'rabbit_list_queues_timeouts' Attributes with names 'rabbit-master' and 'rabbit-start-time' should be public because we monitor this attributes in cycle for all nodes in our script. All timeouts attributes were changed to private to avoid unnecessary transitions. Also, --lifetime and --node options were removed for attrd_updater as 'lifetime' for this command is always 'reboot' and 'node' default value is local one. Closes-bug: #1524672 Change-Id: Ie45ae3a82b8daa35dbdd977dc894877160af457b
Configuration menu - View commit details
-
Copy full SHA for 478fd4a - Browse repository at this point
Copy the full SHA 478fd4aView commit details
Commits on Feb 25, 2016
-
[OCF HA] Increase tolerable number of rabbitmqctl timeouts
We still see that rabbitmqctl list_channels times out from time to time, though the RabbitMQ cluster is absolutely healthy in any other aspect. Setting max_rabbitmqctl_timeouts to 3 seems to be a sane default to help avoid unnecessary restarts.
Configuration menu - View commit details
-
Copy full SHA for 9c8e3da - Browse repository at this point
Copy the full SHA 9c8e3daView commit details -
[OCF HA] Log process id in RabbitMQ OCF script
Several OCF calls might run simultaneously. For example, it often happens that two monitor calls intersect. Logging current process id for each line helps distinguish logs of different calls. Also aligned get_status() logging with format used in all other parts of the script.
Configuration menu - View commit details
-
Copy full SHA for 88afa77 - Browse repository at this point
Copy the full SHA 88afa77View commit details
Commits on Feb 26, 2016
-
[OCF HA] Do not check cluster health if master is not elected
Doing otherwise causes node to restart when get_monitor is called within action_promote - it does not find a master and assumes that it is running out of cluster. Also, code is refactored a little bit - a new function returning current master is created and is used in the changed code.
Configuration menu - View commit details
-
Copy full SHA for 3ac28c4 - Browse repository at this point
Copy the full SHA 3ac28c4View commit details
Commits on Feb 29, 2016
-
Increase tolerable number of rabbitmqctl timeouts
We still see that rabbitmqctl list_channels times out from time to time, though the RabbitMQ cluster is absolutely healthy in any other aspect. Setting max_rabbitmqctl_timeouts to 3 seems to be a sane default to help avoid unnecessary restarts. Upstream PR: rabbitmq/rabbitmq-server#650 Closes-Bug: #1550293 Change-Id: I6b0686ef66ba3966e03c8706594f473e9ab01145
Configuration menu - View commit details
-
Copy full SHA for 86d375f - Browse repository at this point
Copy the full SHA 86d375fView commit details -
[OCF HA] On neighbor promotion do nothing if we are already clustered
+ extracted function checking if we are in the same cluster with given node + made post-promote ignore promotion of self. Previously it was done inside jjj_join, but now we need to do that before the new check. + now we write "post-promote end" log entry at the very end of post-promote, not somewhere in the middle.
Configuration menu - View commit details
-
Copy full SHA for 7a03700 - Browse repository at this point
Copy the full SHA 7a03700View commit details -
Suppress curl progress indicator in rabbit OCF
Upstream PR: rabbitmq/rabbitmq-server#597 curl is used by OCF script for fetching definitions (queues etc.), but results of that invocation is shown as garbage in pacemaker logs - progress indicator doesn't make any sense in logs. According to curl manpage the following combination of options should be used "--silent --show-error" - this will suppress only progress indicator, errors will still be shown. Also other short curl options are replaced with their long counterparts - for improved readability. Change-Id: I5ae35b3f76dc33be68c79f5dc983f0c779529fb9 Closes-Bug: #1540831
Configuration menu - View commit details
-
Copy full SHA for b52f1ed - Browse repository at this point
Copy the full SHA b52f1edView commit details -
Configuration menu - View commit details
-
Copy full SHA for 492c853 - Browse repository at this point
Copy the full SHA 492c853View commit details -
Configuration menu - View commit details
-
Copy full SHA for b90d128 - Browse repository at this point
Copy the full SHA b90d128View commit details
Commits on Mar 2, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 3b56284 - Browse repository at this point
Copy the full SHA 3b56284View commit details
Commits on Mar 4, 2016
-
Log process id in RabbitMQ OCF script
Several OCF calls might run simultaneously. For example, it often happens that two monitor calls intersect. Logging current process id for each line helps distinguish logs of different calls. Also aligned get_status() logging with format used in all other parts of the script. Upstream PR: rabbitmq/rabbitmq-server#653 Closes-Bug: 1553089 Change-Id: Icbaeb560021f70ef13e062cb79fe2cba84e33dce
Configuration menu - View commit details
-
Copy full SHA for f165474 - Browse repository at this point
Copy the full SHA f165474View commit details
Commits on Mar 7, 2016
-
Merge pull request ClusterLabs#653 from dmitrymex/log-pid
[OCF HA] Log process id in RabbitMQ OCF script
Configuration menu - View commit details
-
Copy full SHA for 7979d71 - Browse repository at this point
Copy the full SHA 7979d71View commit details
Commits on Mar 9, 2016
-
Configuration menu - View commit details
-
Copy full SHA for e6adbe9 - Browse repository at this point
Copy the full SHA e6adbe9View commit details
Commits on Mar 11, 2016
-
Revert "Merge "Private attributes usage in rabbitmq script""
This reverts commit 686bed1b4f090d7f6fd368b94a5ced12c8e28744, reversing changes made to d42a753d75dc419c123de257a974ca9c175789f7. Change-Id: I56ce3671558cf12ab7ce7d616e14cf27f3adb5f1 Closes-bug: #1556123
Bogdan Dobrelya committedMar 11, 2016 Configuration menu - View commit details
-
Copy full SHA for e9b3c7d - Browse repository at this point
Copy the full SHA e9b3c7dView commit details -
Revert "Private attributes usage in rabbitmq script"
This reverts commit 4aeaa79bc566c81bc7f5c20d7afbe39c32771aba. Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1556123
Bogdan Dobrelya committedMar 11, 2016 Configuration menu - View commit details
-
Copy full SHA for 214275b - Browse repository at this point
Copy the full SHA 214275bView commit details -
Revert "Private attributes usage in rabbitmq script"
This reverts commit 4aeaa79bc566c81bc7f5c20d7afbe39c32771aba. Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1556123
Bogdan Dobrelya committedMar 11, 2016 Configuration menu - View commit details
-
Copy full SHA for f6bdfe7 - Browse repository at this point
Copy the full SHA f6bdfe7View commit details
Commits on Mar 14, 2016
-
Merge pull request ClusterLabs#686 from bogdando/master
Revert "Private attributes usage in rabbitmq script"
Configuration menu - View commit details
-
Copy full SHA for e0a81d1 - Browse repository at this point
Copy the full SHA e0a81d1View commit details
Commits on Mar 24, 2016
-
Put the RabbitMQ OCF RA policy to /usr/sbin
* Fix failing pcs resource list command and move the policy file from the ocf to policy dir * Configure the custom policy file to be picked in the /usr/sbin/set_rabbitmq_policy as the fuel-libraryX package installs it. * As the upstream rabbitmq-server package does not install one, use the default policy OCF path param as the /usr/local/sbin/... * Add the policy_file param and unit tests to the cluster::rabbitmq_ocf Closes-bug: #1558627 Change-Id: I4937bde611b06c3e39385a322053610c98584d79 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedMar 24, 2016 Configuration menu - View commit details
-
Copy full SHA for d1c8e6b - Browse repository at this point
Copy the full SHA d1c8e6bView commit details -
Put the RabbitMQ OCF RA policy to /usr/sbin
* Fix failing pcs resource list command * Move policy file to examples in docs dirs Related Fuel bug: https://bugs.launchpad.net/fuel/+bug/1558627 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedMar 24, 2016 Configuration menu - View commit details
-
Copy full SHA for 2ed9efd - Browse repository at this point
Copy the full SHA 2ed9efdView commit details -
Configuration menu - View commit details
-
Copy full SHA for b050d93 - Browse repository at this point
Copy the full SHA b050d93View commit details
Commits on Apr 4, 2016
-
Fix half-hearted attempt to erase mnesia in OCF RA
ocf_run does `"$@"`, so "${MNESIA_FILES}/*" wasn't expanded and mnesia directory wasn't actually cleaned up Fuel bug: https://bugs.launchpad.net/fuel/+bug/1565868
Configuration menu - View commit details
-
Copy full SHA for 1a970ad - Browse repository at this point
Copy the full SHA 1a970adView commit details -
Fix half-hearted attempt to erase mnesia in OCF RA
ocf_run does $("$@"), so "${MNESIA_FILES}/*" wasn't expanded and mnesia directory wasn't actually cleaned up It's safe to remove that directory completely - it will be re-created automatically by mnesia. Upstream rabbitmq/rabbitmq-server#724 Change-Id: I0aa47f61e03c99ee6ebb56b833463cdf4ccd243e Closes-Bug: 1565868
Configuration menu - View commit details
-
Copy full SHA for d53418e - Browse repository at this point
Copy the full SHA d53418eView commit details
Commits on Apr 5, 2016
-
Configuration menu - View commit details
-
Copy full SHA for ed9056e - Browse repository at this point
Copy the full SHA ed9056eView commit details
Commits on Apr 7, 2016
-
Stop a rabbitmq pacemaker resource when monitor fails
Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1567355 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedApr 7, 2016 Configuration menu - View commit details
-
Copy full SHA for 3480cea - Browse repository at this point
Copy the full SHA 3480ceaView commit details -
Stop a rabbitmq pacemaker resource when monitor fails
Upstream PR rabbitmq/rabbitmq-server#731 Closes-bug: #1567355 Change-Id: I83415e0e2a40f0e99e7baa26e35b6f7463c52928 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedApr 7, 2016 Configuration menu - View commit details
-
Copy full SHA for 6000a31 - Browse repository at this point
Copy the full SHA 6000a31View commit details -
Configuration menu - View commit details
-
Copy full SHA for 985f90d - Browse repository at this point
Copy the full SHA 985f90dView commit details
Commits on Apr 8, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 742e8c2 - Browse repository at this point
Copy the full SHA 742e8c2View commit details
Commits on Apr 19, 2016
-
Stop process when rabbit is running but is not connected to master.
It's should goes down due to avoid split brain. Change-Id: I4c51f8608702f2284d835ba9c3c9070b2c329ed8 Closes-Bug: #1541471 Upstream PR: rabbitmq/rabbitmq-server#758
Maciej Relewicz committedApr 19, 2016 Configuration menu - View commit details
-
Copy full SHA for 4a4e013 - Browse repository at this point
Copy the full SHA 4a4e013View commit details
Commits on Apr 20, 2016
-
Stop process when rabbit is running but is not connected to master.
It's should goes down due to avoid split brain. Related Fuel bug: https://bugs.launchpad.net/fuel/+bug/1541471 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> Co-authored-by: Maciej Relewicz <mrelewicz@mirantis.com>
Bogdan Dobrelya and Maciej Relewicz committedApr 20, 2016 Configuration menu - View commit details
-
Copy full SHA for 33b9f40 - Browse repository at this point
Copy the full SHA 33b9f40View commit details -
Configuration menu - View commit details
-
Copy full SHA for bec20d0 - Browse repository at this point
Copy the full SHA bec20d0View commit details
Commits on May 10, 2016
-
Private attributes usage in rabbitmq script
There are three types of rabbitmq attributes for pacemaker nodes: -'rabbit-master' -'rabbit-start-time' - timeouts: -'rabbit_list_channels_timeouts' -'rabbit_get_alarms_timeouts' -'rabbit_list_queues_timeouts' Attributes with names 'rabbit-master' and 'rabbit-start-time' should be public because we monitor this attributes in cycle for all nodes in our script. All timeouts attributes were changed to private to avoid unnecessary transitions. Also, --lifetime and --node options were removed for attrd_updater as 'lifetime' for this command is always 'reboot' and 'node' default value is local one. This reverts commit b2b191d2e28b96c9f9a6ea440a383cf4f691d8ad. (As the pacemaker version was updated). Closes-bug: #1524672 Change-Id: I6f0d4a99641b847321754d75605a78fbbc96ddad
Configuration menu - View commit details
-
Copy full SHA for b8e9513 - Browse repository at this point
Copy the full SHA b8e9513View commit details
Commits on May 12, 2016
-
Private attributes usage in rabbitmq script
Required Pacemaker >= 1.1.13. (The command 'attrd_updater' have '-p' option only since this version). There are three types of rabbitmq attributes for pacemaker nodes: -'rabbit-master' -'rabbit-start-time' - timeouts: -'rabbit_list_channels_timeouts' -'rabbit_get_alarms_timeouts' -'rabbit_list_queues_timeouts' Attributes with names 'rabbit-master' and 'rabbit-start-time' should be public because we monitor this attributes in cycle for all nodes in our script. All timeouts attributes were changed to private to avoid unnecessary transitions. Also, --lifetime and --node options were removed for attrd_updater as 'lifetime' for this command is always 'reboot' and 'node' default value is local one.
Configuration menu - View commit details
-
Copy full SHA for 216e164 - Browse repository at this point
Copy the full SHA 216e164View commit details
Commits on May 13, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 951a2d4 - Browse repository at this point
Copy the full SHA 951a2d4View commit details
Commits on Jun 3, 2016
-
Check cluster_status liveness during OCF checks
We've observed some `autoheal` bug that made `cluster_status` became stuck forever.
Configuration menu - View commit details
-
Copy full SHA for 8bdfa3e - Browse repository at this point
Copy the full SHA 8bdfa3eView commit details
Commits on Jun 7, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 385afe5 - Browse repository at this point
Copy the full SHA 385afe5View commit details
Commits on Jun 8, 2016
-
`-` is not allowed in function names by POSIX, and some shells (e.g. `dash`) will consider this as a syntax error.
Configuration menu - View commit details
-
Copy full SHA for d9c434d - Browse repository at this point
Copy the full SHA d9c434dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 73368d2 - Browse repository at this point
Copy the full SHA 73368d2View commit details
Commits on Aug 15, 2016
-
Update iptables calls with --wait
If iptables is currently being called outside of the ocf script, the iptables call will fail because it cannot get a lock. This change updates the iptables call to include the -w flag which will wait until the lock can be established and not just exit with an error.
Alex Schultz committedAug 15, 2016 Configuration menu - View commit details
-
Copy full SHA for 53e9b1a - Browse repository at this point
Copy the full SHA 53e9b1aView commit details -
Update iptables calls with --wait
If iptables is currently being called outside of the ocf script, the iptables call will fail because it cannot get a lock. This change updates the iptables call to include the -w flag which will wait until the lock can be established and not just exit with an error.
Alex Schultz committedAug 15, 2016 Configuration menu - View commit details
-
Copy full SHA for dce9ea0 - Browse repository at this point
Copy the full SHA dce9ea0View commit details
Commits on Aug 16, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 470a2a9 - Browse repository at this point
Copy the full SHA 470a2a9View commit details
Commits on Aug 18, 2016
-
Fix bashisms in rabbitmq OCF RA
Change "printf %b" to be passing the checkbashisms. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedAug 18, 2016 Configuration menu - View commit details
-
Copy full SHA for 2f7b806 - Browse repository at this point
Copy the full SHA 2f7b806View commit details -
Configuration menu - View commit details
-
Copy full SHA for 63cc485 - Browse repository at this point
Copy the full SHA 63cc485View commit details -
[OCF HA] Add ocf_get_private_attr function to RabbitMQ OCF script
The function is extracted from check_timeouts to be re-used later in other parts of the script. Also, swtich check_timeouts to use existing ocf_update_private_attr function.
Configuration menu - View commit details
-
Copy full SHA for abda1ca - Browse repository at this point
Copy the full SHA abda1caView commit details
Commits on Aug 19, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 69c13d4 - Browse repository at this point
Copy the full SHA 69c13d4View commit details
Commits on Aug 22, 2016
-
[OCF HA] Rank master score based on start time
Right now we assign 1000 to the oldest nodes and 1 to others. That creates a problem when Master restarts and no node is promoted until that node starts back. In that case the returned node will have score of 1, like all other slaves and Pacemaker will select to promote it again. The node is clean empty and afterwards other slaves join to it, wiping their data as well. As a result, we loose all the messages. The new algorithm actually ranks nodes, not just selects the oldest one. It also maintains the invariant that if node A started later than node B, then node A score must be smaller than that of node B. As a result, freshly started node has no chance of being selected in preference to older node. If several nodes start simultaneously, among them an older node might temporarily receive lower score than a younger one, but that is neglectable. Also remove any action on demote or demote notification - all of these duplicate actions done in stop or stop notification. With these removed, changing master on a running cluster does not affect RabbitMQ cluster in any way - we just declare another node master and that is it. It is important for the current change because master score might change after initial cluster start up causing master migration from one node to another. This fix is a prerequsite for fix to Fuel bugs https://bugs.launchpad.net/fuel/+bug/1559136 https://bugs.launchpad.net/mos/+bug/1561894
Configuration menu - View commit details
-
Copy full SHA for 091a028 - Browse repository at this point
Copy the full SHA 091a028View commit details -
[OCF HA] Enhance split-brain detection logic
Previous split brain logic worked as follows: each slave checked that it is connected to master. If check fails, slave restarts. The ultimate flaw in that logic is that there is little guarantee that master is alive at the moment. Moreover, if master dies, it is very probable that during the next monitor check slaves will detect its death and restart, causing complete RabbitMQ cluster downtime. With the new approach master node checks that slaves are connected to it and orders them to restart if they are not. The check is performed after master node health check, meaning that at least that node survives. Also, orders expire in one minute and freshly started node ignores orders to restart for three minutes to give cluster time to stabilize. Also corrected the problem, when node starts and is already clustered. In that case OCF script forgot to start the RabbitMQ app, causing subsequent restart. Now we ensure that RabbitMQ app is running. The two introduced attributes rabbit-start-phase-1-time and rabbit-ordered-to-restart are made private. In order to allow master to set node's order to restart, both ocf_update_private_attr and ocf_get_private_attr signatures are expanded to allow passing node name. Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute, attrd_updater returns empty string instead of "(null)", when an attribute is not defined on needed node, but is defined on some other node. Correspondingly changed code to expect empty string, not a "(null)". This fix is a fix for Fuel bugs https://bugs.launchpad.net/fuel/+bug/1559136 https://bugs.launchpad.net/mos/+bug/1561894
Configuration menu - View commit details
-
Copy full SHA for ab1a510 - Browse repository at this point
Copy the full SHA ab1a510View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7199f04 - Browse repository at this point
Copy the full SHA 7199f04View commit details
Commits on Aug 23, 2016
-
Monitor rabbitmq from OCF with less overhead
This will stop wasting network bandwidth for monitoring. E.g. a 200-node OpenStack installation produces aronud 10k queues and 10k channels. Doing single list_queues/list_channels in cluster in this environment results in 27k TCP packets and around 12 megabytes of network traffic. Given that this calls happen ~10 times a minute with 3 controllers, it results in pretty significant overhead. To enable those features you shoud have rabbitmq containing following patches: - rabbitmq/rabbitmq-server#883 - rabbitmq/rabbitmq-server#911 - rabbitmq/rabbitmq-server#915
Configuration menu - View commit details
-
Copy full SHA for f75cdde - Browse repository at this point
Copy the full SHA f75cddeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2d0c979 - Browse repository at this point
Copy the full SHA 2d0c979View commit details
Commits on Aug 26, 2016
-
Perform partition checks from OCF HA script
Partitioned nodes are ordered to restart by master. It may sound like `autoheal`, but the problem is that OCF script and `autoheal` are not compatible because concepts of master in pacemaker and winner in autoheal are completely unrelated.
Configuration menu - View commit details
-
Copy full SHA for 6d7c0f2 - Browse repository at this point
Copy the full SHA 6d7c0f2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e54e16 - Browse repository at this point
Copy the full SHA 6e54e16View commit details
Commits on Aug 31, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 208bb82 - Browse repository at this point
Copy the full SHA 208bb82View commit details
Commits on Sep 6, 2016
-
[OCF HA] Do not suggest to run the second monitor action
Right now we suggest to users to run the second monitor for slaves with depth=30. It made sense previously, when there was an additional check at that depth. Right now we don't have any depth-specific checks and hence it does not make sense to run the second monitor. Moreover, removing the second monitor fixes an issue with Pacemaker not reacting on failing monitor if it takes more than a minute. For details see Fuel bug https://launchpad.net/bugs/1618843
Configuration menu - View commit details
-
Copy full SHA for ad4c5d0 - Browse repository at this point
Copy the full SHA ad4c5d0View commit details
Commits on Sep 7, 2016
-
Configuration menu - View commit details
-
Copy full SHA for fa69a41 - Browse repository at this point
Copy the full SHA fa69a41View commit details -
[OCF HA] Delete Mnesia schema on mnesia reset
Not doing so leads to RabbitMQ node being half-stuck in cluster. As a result, it can't clearly join back and constantly fails. Details could be found in the following Fuel bug: https://bugs.launchpad.net/fuel/+bug/1620649
Configuration menu - View commit details
-
Copy full SHA for c5f0563 - Browse repository at this point
Copy the full SHA c5f0563View commit details
Commits on Sep 12, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 1f62529 - Browse repository at this point
Copy the full SHA 1f62529View commit details
Commits on Sep 16, 2016
-
Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1506423 Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Bogdan Dobrelya committedSep 16, 2016 Configuration menu - View commit details
-
Copy full SHA for 9e4db7d - Browse repository at this point
Copy the full SHA 9e4db7dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 523ce6b - Browse repository at this point
Copy the full SHA 523ce6bView commit details
Commits on Sep 21, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 7e39ca9 - Browse repository at this point
Copy the full SHA 7e39ca9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 38b51b3 - Browse repository at this point
Copy the full SHA 38b51b3View commit details
Commits on Sep 23, 2016
-
Configuration menu - View commit details
-
Copy full SHA for 731d972 - Browse repository at this point
Copy the full SHA 731d972View commit details
Commits on Sep 29, 2016
-
OCF RA: Check partitions on non-master nodes
Partitions reported by `rabbit_node_monitor:partitions/0` are not commutative (i.e. node1 can report itself as partitioned with node2, but not vice versa). Given that we now have strong notion of master in OCF script, we can check for those fishy situations during master health check, and order damaged nodes to restart. Fuel bug: https://bugs.launchpad.net/fuel/+bug/1628487
Configuration menu - View commit details
-
Copy full SHA for 63bf153 - Browse repository at this point
Copy the full SHA 63bf153View commit details
Commits on Oct 17, 2016
-
Correctly return exit code from stop
Panicking and returning non-success on stop often leads to resource becoming unmanaged on that node. Before we called get_status to verify that RabbitMQ is dead. But sometimes it returns error even though RabbitMQ is not running. There is no reason to call it - we will just verify that there is no beam process running. Related fuel bug - https://bugs.launchpad.net/fuel/+bug/1626933
Configuration menu - View commit details
-
Copy full SHA for 2f6ec13 - Browse repository at this point
Copy the full SHA 2f6ec13View commit details
Commits on Mar 31, 2017
-
OCF RA: Don't hardcode primitive name in rabbitmq-server-ha.ocf
We can compute the name of the primitive automatically from environment variables, instead of hard-coding p_rabbitmq-server; this makes the resource agent more flexible. Closes rabbitmq/rabbitmq-server-release#23
Configuration menu - View commit details
-
Copy full SHA for fffe28a - Browse repository at this point
Copy the full SHA fffe28aView commit details -
OCF RA: Don't hardcode primitive name in rabbitmq-server-ha.ocf
We can compute the name of the primitive automatically from environment variables, instead of hard-coding p_rabbitmq-server; this makes the resource agent more flexible. Closes rabbitmq/rabbitmq-server-release#23
Configuration menu - View commit details
-
Copy full SHA for ccfc617 - Browse repository at this point
Copy the full SHA ccfc617View commit details
Commits on Apr 2, 2017
-
Configuration menu - View commit details
-
Copy full SHA for bb39f85 - Browse repository at this point
Copy the full SHA bb39f85View commit details
Commits on Apr 4, 2017
-
OCF RA: Add default_vhost parameter to rabbitmq-server-ha.ocf
This enables the cluster to focus on a vhost that is not /, in case the most important vhost is something else. For reference, other vhosts may exist in the cluster, but these are not guaranteed to not suffer from any data loss. This patch doesn't address this issue. Closes rabbitmq/rabbitmq-server-release#22
Configuration menu - View commit details
-
Copy full SHA for c6f95aa - Browse repository at this point
Copy the full SHA c6f95aaView commit details -
OCF RA: Add new limit_nofile parameter to rabbitmq-server-ha OCF RA
This enables to change the limit of open files, as the default on distributions is usually too low for rabbitmq. Default is 65535.
Configuration menu - View commit details
-
Copy full SHA for c3434f1 - Browse repository at this point
Copy the full SHA c3434f1View commit details -
Merge pull request ClusterLabs#24 from vuntz/ocf-vhost
OCF RA: Add vhost parameter to rabbitmq-server-ha.ocf
Configuration menu - View commit details
-
Copy full SHA for 10cf912 - Browse repository at this point
Copy the full SHA 10cf912View commit details -
OCF RA: Only set limit for open files when higher than current value
This allows to set the limit via some other way.
Configuration menu - View commit details
-
Copy full SHA for 17eb6d8 - Browse repository at this point
Copy the full SHA 17eb6d8View commit details
Commits on Apr 5, 2017
-
Merge pull request ClusterLabs#21 from vuntz/ocf-limit_nofile
OCF RA: Add new limit_nofile parameter to both OCF resource agents
Configuration menu - View commit details
-
Copy full SHA for 9acc77b - Browse repository at this point
Copy the full SHA 9acc77bView commit details -
Manually backport ClusterLabs#20, ClusterLabs#21, ClusterLabs#24, Clu…
…sterLabs#25 by @untz and @aplanas to stable
Configuration menu - View commit details
-
Copy full SHA for b29f0e5 - Browse repository at this point
Copy the full SHA b29f0e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4743057 - Browse repository at this point
Copy the full SHA 4743057View commit details -
Manually backport ClusterLabs#20, ClusterLabs#21, ClusterLabs#24, Clu…
…sterLabs#25 by @vuntz and @aplanas to stable
Configuration menu - View commit details
-
Copy full SHA for 51fe230 - Browse repository at this point
Copy the full SHA 51fe230View commit details -
Configuration menu - View commit details
-
Copy full SHA for 695dd24 - Browse repository at this point
Copy the full SHA 695dd24View commit details
Commits on May 9, 2017
-
Some parts of ClusterLabs#21 have not been added to the stable branch. This change fixes the issue by adding missing changes to rabbitmq-server-ha.ocf and also fixing rabbitmq-server.ocf
Configuration menu - View commit details
-
Copy full SHA for e154327 - Browse repository at this point
Copy the full SHA e154327View commit details
Commits on May 16, 2017
-
Configuration menu - View commit details
-
Copy full SHA for ba1479f - Browse repository at this point
Copy the full SHA ba1479fView commit details
Commits on Dec 8, 2017
-
OCF RA: Avoid promoting nodes with same start time as master
It may happen that two nodes have the same start time, and one of these is the master. When this happens, the node actually gets the same score as the master and can get promoted. There's no reason to avoid being stable here, so let's keep the same master in that scenario.
Configuration menu - View commit details
-
Copy full SHA for cb09e8f - Browse repository at this point
Copy the full SHA cb09e8fView commit details -
OCF RA: Fix test for no node in start notification handler
If there's nothing starting and nothing active, then we do a -z " ", which doesn't have the same result as -z "". Instead, just test for emptiness for each set of nodes.
Configuration menu - View commit details
-
Copy full SHA for 431644a - Browse repository at this point
Copy the full SHA 431644aView commit details -
OCF RA: Do not start rabbitmq if notification of start is not about us
Right now, every time we get a start notification, all nodes will ensure the rabbitmq app is started. This makes little sense, as nodes that are already active don't need to do that. On top of that, this had the sideeffect of updating the start time for each of these nodes, which could result in the master moving to another node.
Configuration menu - View commit details
-
Copy full SHA for 263047c - Browse repository at this point
Copy the full SHA 263047cView commit details -
OCF RA: Fix logging in start notification handler
The "post-start end" log message was written too early (some things were still done afterwards), and not in all cases (it was inside a if statement).
Configuration menu - View commit details
-
Copy full SHA for 044250f - Browse repository at this point
Copy the full SHA 044250fView commit details
Commits on Dec 12, 2017
-
Merge pull request ClusterLabs#64 from vuntz/ocf-fix-notify-start
OCF RA: Fix various issues with start notification handler
Configuration menu - View commit details
-
Copy full SHA for 3183a2c - Browse repository at this point
Copy the full SHA 3183a2cView commit details
Commits on Dec 14, 2017
-
OCF RA: Avoid promoting nodes with same start time as master
It may happen that two nodes have the same start time, and one of these is the master. When this happens, the node actually gets the same score as the master and can get promoted. There's no reason to avoid being stable here, so let's keep the same master in that scenario. (cherry picked from commit 62a4f7561171328cd1d62cab394d0bba269ea7ad) (cherry picked from commit 861f2a57f916a9829e9a11092ada2bb52bdaf028)
Configuration menu - View commit details
-
Copy full SHA for 9a95a2c - Browse repository at this point
Copy the full SHA 9a95a2cView commit details -
(cherry picked from commit a9b4a4ff97a96e798de51933fc44f61aa6bc88a3)
Configuration menu - View commit details
-
Copy full SHA for c0688a9 - Browse repository at this point
Copy the full SHA c0688a9View commit details -
(cherry picked from commit a9b4a4ff97a96e798de51933fc44f61aa6bc88a3)
Configuration menu - View commit details
-
Copy full SHA for 46c3fd2 - Browse repository at this point
Copy the full SHA 46c3fd2View commit details
Commits on Dec 18, 2017
-
Merge branch 'rabbitmq-server-release-153734997' into rabbitmq-server…
…-release-153734997-master
Configuration menu - View commit details
-
Copy full SHA for a0d992f - Browse repository at this point
Copy the full SHA a0d992fView commit details
Commits on Dec 20, 2017
-
OCF RA: Do not consider local failures as remote node problems
In is_clustered_with(), commands that we run to check if the node is clustered with us, or partitioned with us may fail. When they fail, it actually doesn't tell us anything about the remote node. Until now, we were considering such failures as hints that the remote node is not in a sane state with us. But doing so has pretty negative impact, as it can cause rabbitmq to get restarted on the remote node, causing quite some disruption. So instead of doing this, ignore the error (it's still logged). There was a comment in the code wondering what is the best behavior; based on experience, I think preferring stability is the slightly more acceptable poison between the two options.
Configuration menu - View commit details
-
Copy full SHA for fac5c26 - Browse repository at this point
Copy the full SHA fac5c26View commit details
Commits on Nov 19, 2018
-
Use ocf_attribute_target instead of crm_node
Instead of calling crm_node directly it is preferrable to use the ocf_attribute_target function. This function will return crm_node -n as usual, except when run inside a bundle (aka container in pcmk language). Inside a bundle it will return the bundle name or, if the meta attribute meta_container_attribute_target is set to 'host', it will return the physical node name where the bundle is running. Typically when running a rabbitmq cluster inside containers it is desired to set 'meta_container_attribute_target=host' on the rabbit cluster resource so that the RA is aware on which host it is running. Tested both on baremetal (without containers): Master/Slave Set: rabbitmq-master [rabbitmq] Masters: [ controller-0 controller-1 controller-2 ] And with bundles as well. Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 478442b - Browse repository at this point
Copy the full SHA 478442bView commit details
Commits on Mar 21, 2019
-
This commit updates URLs to prefer the https protocol. Redirects are not followed to avoid accidentally expanding intentionally shortened URLs (i.e. if using a URL shortener). # Fixed URLs ## Fixed Success These URLs were switched to an https URL with a 2xx status. While the status was successful, your review is still recommended. * [ ] http://www.apache.org/licenses/LICENSE-2.0 with 1 occurrences migrated to: https://www.apache.org/licenses/LICENSE-2.0 ([https](https://www.apache.org/licenses/LICENSE-2.0) result 200).
Configuration menu - View commit details
-
Copy full SHA for b2788dd - Browse repository at this point
Copy the full SHA b2788ddView commit details
Commits on Jan 31, 2020
-
Allow operator to disable iptables client blocking
Currently the resource agent hard-codes iptables calls to block off client access before the resource becomes master. This was done historically because many libraries were fairly buggy detecting a not-yet functional rabbitmq, so they were being helped by getting a tcp RST packet and they would go on trying their next configured server. It makes sense to be able to disable this behaviour because most libraries by now have gotten better at detecting timeouts when talking to rabbit and because when you run rabbitmq inside a bundle (pacemaker term for a container with an OCF resource inside) you normally do not have access to iptables. Tested by creating a three-node bundle cluster inside a container: Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest] Replica[0] rabbitmq-bundle-podman-0 (ocf::heartbeat:podman): Started controller-0 rabbitmq-bundle-0 (ocf::pacemaker:remote): Started controller-0 rabbitmq (ocf::rabbitmq:rabbitmq-server-ha): Master rabbitmq-bundle-0 Replica[1] rabbitmq-bundle-podman-1 (ocf::heartbeat:podman): Started controller-1 rabbitmq-bundle-1 (ocf::pacemaker:remote): Started controller-1 rabbitmq (ocf::rabbitmq:rabbitmq-server-ha): Master rabbitmq-bundle-1 Replica[2] rabbitmq-bundle-podman-2 (ocf::heartbeat:podman): Started controller-2 rabbitmq-bundle-2 (ocf::pacemaker:remote): Started controller-2 rabbitmq (ocf::rabbitmq:rabbitmq-server-ha): Master rabbitmq-bundle-2 The ocf resource was created inside a bundle with: pcs resource create rabbitmq ocf:rabbitmq:rabbitmq-server-ha avoid_using_iptables="true" \ meta notify=true container-attribute-target=host master-max=3 ordered=true \ op start timeout=200s stop timeout=200s promote timeout=60s bundle rabbitmq-bundle Signed-off-by: Michele Baldessari <michele@acksyn.org>
Configuration menu - View commit details
-
Copy full SHA for f489110 - Browse repository at this point
Copy the full SHA f489110View commit details
Commits on Nov 13, 2020
-
Merge remote-tracking branch 'rabbitmq_server_release/master'
Corresponding to master at 7b25a1cdb1bf9e5920f4394efc3096fbcf09de1f
Configuration menu - View commit details
-
Copy full SHA for 9214437 - Browse repository at this point
Copy the full SHA 9214437View commit details
Commits on Feb 28, 2021
-
Allow rabbitmq to run in a larger cluster composed of also non-rabbit…
…mq nodes We introduce the OCF_RESKEY_allowed_cluster_node parameter which can be used to specify which nodes of the cluster rabbitmq is expected to run on. When this variable is not set the resource agent assumes that all nodes of the cluster (output of crm_node -l) are eligible to run rabbitmq. The use case here is clusters that have a large numbers of node, where only a specific subset is used for rabbitmq (usually this is done with some constraints). Tested in a 9-node cluster as follows: [root@messaging-0 ~]# pcs resource config rabbitmq Resource: rabbitmq (class=ocf provider=rabbitmq type=rabbitmq-server-ha) Attributes: allowed_cluster_nodes="messaging-0 messaging-1 messaging-2" avoid_using_iptables=true Meta Attrs: container-attribute-target=host master-max=3 notify=true ordered=true Operations: demote interval=0s timeout=30 (rabbitmq-demote-interval-0s) monitor interval=5 timeout=30 (rabbitmq-monitor-interval-5) monitor interval=3 role=Master timeout=30 (rabbitmq-monitor-interval-3) notify interval=0s timeout=20 (rabbitmq-notify-interval-0s) promote interval=0s timeout=60s (rabbitmq-promote-interval-0s) start interval=0s timeout=200s (rabbitmq-start-interval-0s) stop interval=0s timeout=200s (rabbitmq-stop-interval-0s) [root@messaging-0 ~]# pcs status |grep -e rabbitmq -e messaging * Online: [ controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ] ... * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]: * rabbitmq-bundle-0 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-0 * rabbitmq-bundle-1 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-1 * rabbitmq-bundle-2 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-2
Configuration menu - View commit details
-
Copy full SHA for 3a9253a - Browse repository at this point
Copy the full SHA 3a9253aView commit details -
Stop logging unblock client access unconditionally
Currently every call to unblock_client_access() is followed by a log line showing which function requested the unblocking. When we pass the parameter OCF_RESKEY_avoid_using_iptables=true it makes no sense to log unblocking of iptables since it is effectively a no-op. Let's move that logging inside the unblock_client_access() function allowing a parameter to log which function called it. Tested on a cluster with rabbitmq bundles with avoid_using_iptables=true and observed no spurious logging any longer: [root@messaging-0 ~]# journalctl |grep 'unblocked access to RMQ port' |wc -l 0
Configuration menu - View commit details
-
Copy full SHA for 4d68998 - Browse repository at this point
Copy the full SHA 4d68998View commit details -
Only export RABBITMQ_NODE_PORT when it is not the default
RABBITMQ_NODE_PORT is exported by default and set to 5672. Re-exporting it in that case will actually break the case where we set up rabbit with tls on the default port: 2021-02-28 07:44:10.732 [error] <0.453.0> Failed to start Ranch listener {acceptor,{172,17,1,93},5672} in ranch_ssl:listen([{cacerts,'...'},{key,'...'},{cert,'...'},{ip,{172,17,1,93}},{port,5672}, inet,{keepalive,true}, {versions,['tlsv1.1','tlsv1.2']},{certfile,"/etc/pki/tls/certs/rabbitmq.crt"},{keyfile,"/etc/pki/tls/private/rabbitmq.key"}, {depth,1},{secure_renegotiate,true},{reuse_sessions,true},{honor_cipher_order,true},{verify,verify_none},{fail_if_no_peer_cert,false}]) for reason eaddrinuse (address already in use) This is because by explicitely always exporting it, we force rabbit to listen to that port via tcp and that is a problem when we want to do SSL on that port. Since 5672 is the default port already we can just avoid exporting this port when the user does not customize the port. Tested both in a non-TLS env (A) and in a TLS-env (B) successfully: (A) Non-TLS [root@messaging-0 /]# grep -ir -e tls -e ssl /etc/rabbitmq [root@messaging-0 /]# [root@messaging-0 /]# pcs status |grep rabbitmq * rabbitmq-bundle-0 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-0 * rabbitmq-bundle-1 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-1 * rabbitmq-bundle-2 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-2 (B) TLS [root@messaging-0 /]# grep -ir -e tls -e ssl /etc/rabbitmq/ |head -n3 /etc/rabbitmq/rabbitmq.config: {ssl, [{versions, ['tlsv1.1', 'tlsv1.2']}]}, /etc/rabbitmq/rabbitmq.config: {ssl_listeners, [{"172.17.1.48", 5672}]}, /etc/rabbitmq/rabbitmq.config: {ssl_options, [ [root@messaging-0 ~]# pcs status |grep rabbitmq * rabbitmq-bundle-0 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-0 * rabbitmq-bundle-1 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-1 * rabbitmq-bundle-2 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-2 Note: I don't believe we should export RABBITMQ_NODE_PORT at all, since you can specify all ports in the rabbit configuration anyways, but prefer to play it safe here as folks might rely on being able to customize this. Signed-off-by: Michele Baldessari <michele@acksyn.org>
Configuration menu - View commit details
-
Copy full SHA for c61b5df - Browse repository at this point
Copy the full SHA c61b5dfView commit details
Commits on Mar 1, 2021
-
Allow rabbitmq to run in a larger cluster composed of also non-rabbit…
…mq nodes We introduce the OCF_RESKEY_allowed_cluster_node parameter which can be used to specify which nodes of the cluster rabbitmq is expected to run on. When this variable is not set the resource agent assumes that all nodes of the cluster (output of crm_node -l) are eligible to run rabbitmq. The use case here is clusters that have a large numbers of node, where only a specific subset is used for rabbitmq (usually this is done with some constraints). Tested in a 9-node cluster as follows: [root@messaging-0 ~]# pcs resource config rabbitmq Resource: rabbitmq (class=ocf provider=rabbitmq type=rabbitmq-server-ha) Attributes: allowed_cluster_nodes="messaging-0 messaging-1 messaging-2" avoid_using_iptables=true Meta Attrs: container-attribute-target=host master-max=3 notify=true ordered=true Operations: demote interval=0s timeout=30 (rabbitmq-demote-interval-0s) monitor interval=5 timeout=30 (rabbitmq-monitor-interval-5) monitor interval=3 role=Master timeout=30 (rabbitmq-monitor-interval-3) notify interval=0s timeout=20 (rabbitmq-notify-interval-0s) promote interval=0s timeout=60s (rabbitmq-promote-interval-0s) start interval=0s timeout=200s (rabbitmq-start-interval-0s) stop interval=0s timeout=200s (rabbitmq-stop-interval-0s) [root@messaging-0 ~]# pcs status |grep -e rabbitmq -e messaging * Online: [ controller-0 controller-1 controller-2 database-0 database-1 database-2 messaging-0 messaging-1 messaging-2 ] ... * Container bundle set: rabbitmq-bundle [cluster.common.tag/rhosp16-openstack-rabbitmq:pcmklatest]: * rabbitmq-bundle-0 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-0 * rabbitmq-bundle-1 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-1 * rabbitmq-bundle-2 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-2
Configuration menu - View commit details
-
Copy full SHA for 54d190d - Browse repository at this point
Copy the full SHA 54d190dView commit details -
Stop logging unblock client access unconditionally
Currently every call to unblock_client_access() is followed by a log line showing which function requested the unblocking. When we pass the parameter OCF_RESKEY_avoid_using_iptables=true it makes no sense to log unblocking of iptables since it is effectively a no-op. Let's move that logging inside the unblock_client_access() function allowing a parameter to log which function called it. Tested on a cluster with rabbitmq bundles with avoid_using_iptables=true and observed no spurious logging any longer: [root@messaging-0 ~]# journalctl |grep 'unblocked access to RMQ port' |wc -l 0
Configuration menu - View commit details
-
Copy full SHA for 8c4055c - Browse repository at this point
Copy the full SHA 8c4055cView commit details -
Only export RABBITMQ_NODE_PORT when it is not the default
RABBITMQ_NODE_PORT is exported by default and set to 5672. Re-exporting it in that case will actually break the case where we set up rabbit with tls on the default port: 2021-02-28 07:44:10.732 [error] <0.453.0> Failed to start Ranch listener {acceptor,{172,17,1,93},5672} in ranch_ssl:listen([{cacerts,'...'},{key,'...'},{cert,'...'},{ip,{172,17,1,93}},{port,5672}, inet,{keepalive,true}, {versions,['tlsv1.1','tlsv1.2']},{certfile,"/etc/pki/tls/certs/rabbitmq.crt"},{keyfile,"/etc/pki/tls/private/rabbitmq.key"}, {depth,1},{secure_renegotiate,true},{reuse_sessions,true},{honor_cipher_order,true},{verify,verify_none},{fail_if_no_peer_cert,false}]) for reason eaddrinuse (address already in use) This is because by explicitely always exporting it, we force rabbit to listen to that port via tcp and that is a problem when we want to do SSL on that port. Since 5672 is the default port already we can just avoid exporting this port when the user does not customize the port. Tested both in a non-TLS env (A) and in a TLS-env (B) successfully: (A) Non-TLS [root@messaging-0 /]# grep -ir -e tls -e ssl /etc/rabbitmq [root@messaging-0 /]# [root@messaging-0 /]# pcs status |grep rabbitmq * rabbitmq-bundle-0 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-0 * rabbitmq-bundle-1 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-1 * rabbitmq-bundle-2 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-2 (B) TLS [root@messaging-0 /]# grep -ir -e tls -e ssl /etc/rabbitmq/ |head -n3 /etc/rabbitmq/rabbitmq.config: {ssl, [{versions, ['tlsv1.1', 'tlsv1.2']}]}, /etc/rabbitmq/rabbitmq.config: {ssl_listeners, [{"172.17.1.48", 5672}]}, /etc/rabbitmq/rabbitmq.config: {ssl_options, [ [root@messaging-0 ~]# pcs status |grep rabbitmq * rabbitmq-bundle-0 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-0 * rabbitmq-bundle-1 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-1 * rabbitmq-bundle-2 (ocf::rabbitmq:rabbitmq-server-ha): Master messaging-2 Note: I don't believe we should export RABBITMQ_NODE_PORT at all, since you can specify all ports in the rabbit configuration anyways, but prefer to play it safe here as folks might rely on being able to customize this. Signed-off-by: Michele Baldessari <michele@acksyn.org>
Configuration menu - View commit details
-
Copy full SHA for 7410979 - Browse repository at this point
Copy the full SHA 7410979View commit details
Commits on Mar 4, 2021
-
Merge pull request #2864 from rabbitmq/mk-lager-3-9-0
Upgrade Lager to 3.9 for OTP 24 compatibility
Configuration menu - View commit details
-
Copy full SHA for bff3727 - Browse repository at this point
Copy the full SHA bff3727View commit details
Commits on Jun 17, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 8799071 - Browse repository at this point
Copy the full SHA 8799071View commit details
Commits on Jun 30, 2021
-
OCF RA: fix start/stop handling
In newer Erlang, beam.smp no longer writes a pidfile, until the rabbit applicataion starts. It also no longer passes -mneisa dir and -sname, which are required in order to start the node only delaying the application start up. Handle that so the Pacemaker HA setup keeps working with newer Erlang and rabbitmq-server versions. Fix '[ x == x ]' bashisms as well to silence errors in the RA logs. Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Configuration menu - View commit details
-
Copy full SHA for 06aa9a8 - Browse repository at this point
Copy the full SHA 06aa9a8View commit details
Commits on Oct 1, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 07301d2 - Browse repository at this point
Copy the full SHA 07301d2View commit details -
Milestone histroy for rabbitmq OCF RA from Fuel
Fuel for OpenStack origined the rabbitmq OCF RA. Restore history of changes for it. This commit is empty and only set a milestone for its Fuel histroy ending at Tue May 10 15:27:53 2016. Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 006f625 - Browse repository at this point
Copy the full SHA 006f625View commit details -
Ignore stderr when calling rabbitmqctl eval()
Every time we recompile the erlang/elixir/rebar/rabbitmq stack there is one or more fresh new warnings that will completely trip up any parsing of these commands. Most end up being bugs that get fixed later on [1]. Since stderr is rarely interesting and just holds any rebase up, let's ignore it when running these rabbitmqctl commands. [1] https://elixirforum.com/t/mix-local-hex-warning-authenticity-is-not-established-by-certificate-path-validation/39665 Authored-by: Michele Baldessari <michele@acksyn.org> Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Configuration menu - View commit details
-
Copy full SHA for a935311 - Browse repository at this point
Copy the full SHA a935311View commit details -
Configuration menu - View commit details
-
Copy full SHA for ecc7231 - Browse repository at this point
Copy the full SHA ecc7231View commit details -
New home for rabbitmq-server-ha RA OCF m/s HA
Moving it from repo: https://github.com/rabbitmq/rabbitmq-server The original path: scripts/rabbitmq-server-ha.ocf Also preserve list of authors and changes history since its very initial commit in the Fuel for OpenStack project, now archieved: https://github.com/openstack-archive/fuel-library To get the history use: $ git log --follow heartbeat/rabbitmq-server-ha.ocf Reasoning behind: the OCF RA script provides M/S HA pacemaker resource for RabbitMQ cluster and better fits this place. Background ========== It's been actively maintained for years. And now it needs a new home due to requests of RabbitMQ team, since it is no longer possible to run CI tests for changes proposed against it by the old location. TripleO upstream project and its layered RH OSP product have plans to adopt this OCF RA for its use. That guarantees the future maintanance and support for it. How it works ========== Documentation is kept by its original upstream location: https://www.rabbitmq.com/pacemaker.html#auto-pacemaker Future Plans ============ Once it is there, the package builds for RDO will catch up changes for that OCF RA and make sure it's CI'ed, also in TripleO and OSP. Status Quo ========== Until the adoption completes, I'm planning to test changes proposed by this new location in my fork, with github actions, like [0]. The CI runs on pre-build images [1] and vagrant scripts [1] that I maintain for the (more or less) recent Pacemaker and RabbitMQ builds. The test coverage includes a simple cluster assemble smoke test and a sofisticated Jepsen testcase that verifies auto-healing of the cluster resource in Pacemaker managed by this OCF RA. [0] https://github.com/bogdando/rabbitmq-server/runs/3757495446 [1] https://hub.docker.com/r/bogdando/rabbitmq-cluster-ocf [2] https://github.com/bogdando/rabbitmq-cluster-ocf-vagrant Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 8d154d8 - Browse repository at this point
Copy the full SHA 8d154d8View commit details
Commits on Nov 3, 2021
-
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 3555d9e - Browse repository at this point
Copy the full SHA 3555d9eView commit details