Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated RabbitMQ-Chart to 1.46.1 & improved Reboot-Resilience #158

Merged
merged 1 commit into from
Nov 23, 2020
Merged

Updated RabbitMQ-Chart to 1.46.1 & improved Reboot-Resilience #158

merged 1 commit into from
Nov 23, 2020

Conversation

moonrail
Copy link
Contributor

@moonrail moonrail commented Nov 20, 2020

Hello altogether

Potentially partly helps with the #11

This Pull Request updates RabbitMQs-Chart-Dependency to 1.46.1, as on our kubernetes installations (1.17, 1.18, 1.19) 1.44.1 would not run due to mysterious disk-space complaints while being correctly assigned to a PersistentVolume and being able to RW to it.

As this new RabbitMQ-Version enables Prometheus-Monitoring by default, most of current installations would fail, therefore this is disabled it by default.

Enabled rabbitmqErlangCookie, as otherwise Cluster-Data is not reusable after RabbitMQ-Deployment-Restart/-Rebuild (or short period of 0 Replicas).

Added forceBoot, as otherwise Cluster-Data is not reusable after RabbitMQ-Deployment-Restart/-Rebuild (or short period of 0 Replicas), as Mnesia Tables are not cleaned up and cause RabbitMQ to not boot up. See helm/charts#13485

So this should improve User Experience by not having to ditch PersistentVolumes after RabbitMQ-Redeployments.

Not really sure about this - but if StackStorm holds queued Executions in RabbitMQ, this would also help in disaster cases, to not loose all running Executions.

Tested with 3.3dev on kubernetes 1.17, 1.18 & 1.19.

Please let me know, if there is something to improve. :)

@pull-request-size pull-request-size bot added size/XS PR that changes 0-9 lines. Quick fix/merge. size/S PR that changes 10-29 lines. Very easy to review. and removed size/XS PR that changes 0-9 lines. Quick fix/merge. labels Nov 20, 2020
values.yaml Outdated
Comment on lines 453 to 455
# On unclean cluster restarts forceBoot is required to cleanup Mnesia tables (see: https://github.com/helm/charts/issues/13485)
forceBoot: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As every option is about trade-off, worth adding comment for this new default:

Suggested change
# On unclean cluster restarts forceBoot is required to cleanup Mnesia tables (see: https://github.com/helm/charts/issues/13485)
forceBoot: true
# On unclean cluster restarts forceBoot is required to cleanup Mnesia tables (see: https://github.com/helm/charts/issues/13485)
# Use it only if you prefer availability over integrity.
forceBoot: true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested comment is added

Copy link
Member

@arm4b arm4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for PR! 👍

Just left a few minor comments to address and we're good to merge.

@@ -458,7 +460,7 @@ rabbitmq-ha:
#rabbitmqMemoryHighWatermark: 512MB
#rabbitmqMemoryHighWatermarkType: absolute
# Up to 255 character string, should be fixed so that re-deploying the chart does not fail (see: https://github.com/helm/charts/issues/12371)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're defaulting rabbitmqErlangCookie value for everyone, let's at least include a warning comment as recommendation to change the default.

Suggested change
# Up to 255 character string, should be fixed so that re-deploying the chart does not fail (see: https://github.com/helm/charts/issues/12371)
# Up to 255 character string, should be fixed so that re-deploying the chart does not fail (see: https://github.com/helm/charts/issues/12371)
# NB! It's highly recommended to change the default insecure rabbitmqErlangCookie value!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning is added as well

CHANGELOG.md Outdated
@@ -1,7 +1,10 @@
# Changelog

## In Development

* Update `rabbitmq-ha` 3rd party chart from `1.44.1` to `1.46.1` (#158) (by @moonrail)
* Disable newly introduced `rabbitmq-ha` prometheus operator by default (#158) (by @moonrail)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be omitted as there's too verbose changelog for a single PR and we stick with the previous default.

Suggested change
* Disable newly introduced `rabbitmq-ha` prometheus operator by default (#158) (by @moonrail)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is now removed - wasn't sure about three lines of Changelog for one PR either when writing it

@arm4b arm4b added the enhancement New feature or request label Nov 23, 2020
values.yaml Outdated
Comment on lines 461 to 465
#rabbitmqErlangCookie: 8MrqQdCQ6AQ8U3MacSubHE5RqkSfvNaRHzvxuFcG
rabbitmqErlangCookie: 8MrqQdCQ6AQ8U3MacSubHE5RqkSfvNaRHzvxuFcG
Copy link
Member

@arm4b arm4b Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was, in fact proposed in older PR and it's questionable insecure default to me.

I realize that it could be useful for someone re-deploying (destroying/creating) rabbitmq-ha for many times in a row. And it also could take time to understand they need the same rabbitmqErlangCookie to make re-deployment not fail with the same PV/PVC.

But instead of forcing this rabbitmq cookie for everyone, we decided previously to hide the value under comment and include recommendation about why it could be useful or important. This way someone experiencing re-deployment issue would be able to consult with this Helm values hint.

Considering it's a second PR to enable rabbitmqErlangCookie by default, - let's do it 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was enabled in my cluster. I ran into this issue when we upgrade clusters (2 times) for Kubernetes version in EKS. AWS drains pods from old node groups and shift them to the new node groups. Even though all pods shift fine, due to the rabbitmqErlangCookie mismatch and not found in the helm chart, i was running into the issue which required me to delete the PVCs and then run helm upgrade to reconstruct the rabbitmq-ha cluster. Of course all that is a down-time. But since enabling that, i could see all pods shifted to the new nodes just fine and app was on-line the whole time! I would recommend it. Thanks!

@arm4b
Copy link
Member

arm4b commented Nov 23, 2020

Looks good!
Can you please also fix the CHANGELOG.md git conflict?

…bernetes installations.

Enabled Erlang-Cookie & Force-Boot, as otherwise Cluster-Data is not reusable after Deployment-Restart.
Due to new RabbitMQ-Version enabling Prometheus-Monitoring by default, most installations would fail, therefore disabled it by default.
@moonrail
Copy link
Contributor Author

Rebased, conflict should be gone now

Copy link
Member

@arm4b arm4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@arm4b arm4b merged commit 70e861a into StackStorm:master Nov 23, 2020
@moonrail moonrail deleted the update_rabbitmq_chart branch November 23, 2020 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size/S PR that changes 10-29 lines. Very easy to review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants