Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: upgrade redis-ha chart and enable haproxy #3147

Merged

Conversation

shelby-moore
Copy link
Contributor

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Optional. My organization is added to the README.
  • I've signed the CLA and my build is green (troubleshooting builds).

@shelby-moore
Copy link
Contributor Author

shelby-moore commented Feb 21, 2020

This is still a bit rough, but it's one potential solution for the unknown app state issue mentioned in #3070.

With the current HA manifests, we run in to errors where ArgoCD is attempting to write to a Redis replica:

level=error msg="Failed to set app state: READONLY You can't write against a read only replica."

I took a look at the Redis HA helm chart and saw that it is currently on 3.3.1. The current version is 4.3.4, which includes a fairly significant change intended to address this issue by running haproxy in front of Redis (helm/charts#15305).

In this PR, I have bumped the redis-ha version the ArgoCD HA installation uses to 4.3.4 and modified the Kustomizations to add the appropriate labels to the new haproxy components. I regenerated the installation manifests using make manifests-local. I also modified the ArgoCD cache code so that it never uses the failover client, as it does not need to worry about this with haproxy running in front of Redis. I used the newly generated installation manifest to update ArgoCD on a cluster I've been experiencing this issue on, and have not run in to it since.

If this is a change you're interested in accepting, I'll need some time to sort out the manifest tests.

@jannfis
Copy link
Member

jannfis commented Feb 22, 2020

@shelby-moore Hm, an interesting approach that the Redis guys are going there. I think migrating the complexity of accessing the master Redis instance into a dedicated component makes much sense from the application's point of view (less code to maintain & same methods to access HA and non-HA Redis instances). HAProxy seems a good choice for the frontend to Redis, too. And that this architecture is endorsed by upstream and probably used a lot in the wild is another argument.

So, for me, this change would be more than welcome if polished up. @jessesuen and @alexmt, what's your opinion on this?

@carlosjgp
Copy link

I took a look at the Redis HA helm chart and saw that it is currently on 3.3.1. The current version is 4.3.4, which includes a fairly significant change intended to address this issue by running haproxy in front of Redis (helm/charts#15305).

I read about the HA proxy implementation on the RedisHA chart and the main motivation is a to provide a reliable connection from outside of the cluster (k8s I guess) due to this s_down problem on the sentinel where the master node is running...
For me its little bit vague but points to a wrongly configured health check on the sentinel

There is another Redis chart that's also HA maintained by Bitnami that managed this problem on a different way
https://github.com/bitnami/bitnami-docker-redis-sentinel/issues/1#issuecomment-409489669

I used Redid HA chart in the past on production workloads but I ended up migrating to the Bitnami chart because not all the Redis clients where compatible with the sentinel and personally I found the Bitnami approach more "Kubernetised"

@alexmt
Copy link
Collaborator

alexmt commented Feb 24, 2020

Thank you for contribution @shelby-moore !

I agree with @jannfis , it worth switching to HAProxy if it helps to resolve #3070 .

@@ -1,15 +1,15 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codegen CI job jas failed because kustomization.yaml was upgraded to kustomize 3.x . This breaks compatibility with replicated ship which stuck with kustomize 2.x. Can you please run make manifests ? It should regenerate manifests using right kustomize version.

@@ -23,27 +23,14 @@ func NewCache(client CacheClient) *Cache {
// AddCacheFlagsToCmd adds flags which control caching to the specified command
func AddCacheFlagsToCmd(cmd *cobra.Command) func() (*Cache, error) {
redisAddress := ""
sentinelAddresses := make([]string, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the ability to use failover clients. It should provide a quick way switching back to sentinel if HA proxy is not stable.

@codecov
Copy link

codecov bot commented Feb 24, 2020

Codecov Report

Merging #3147 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #3147   +/-   ##
======================================
  Coverage    38.6%   38.6%           
======================================
  Files         168     168           
  Lines       18269   18269           
  Branches      237     237           
======================================
  Hits         7053    7053           
  Misses      10342   10342           
  Partials      874     874

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64c8ac7...2f7ba8c. Read the comment docs.

@claassistantio
Copy link

claassistantio commented Feb 24, 2020

CLA assistant check
All committers have signed the CLA.

@shelby-moore shelby-moore force-pushed the upgrade-redis-ha-chart-and-enable-haproxy branch from a86780d to 19d6259 Compare February 24, 2020 23:39
@@ -2354,6 +2484,7 @@ metadata:
app.kubernetes.io/name: argocd-redis-ha
app.kubernetes.io/part-of: argocd
name: argocd-redis-ha-announce-0
namespace: default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here. The namespace should not be hard-wired into the yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being pulled in from the upstream redis-ha chart: helm/charts@e1b39e9#diff-a988587cdea159026a58cc36592ce812

I can work on a mechanism to strip it out if it should not be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've added a kustomize patch to strip namespace from the redis-ha manifest.

@alexmt
Copy link
Collaborator

alexmt commented Feb 26, 2020

LGTM. I think it is ready to merge after addressing @jessesuen comment: #3147 (comment)

Copy link
Collaborator

@alexmt alexmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @shelby-moore ! LGTM

CI should deploy the change to https://cd.apps.argoproj.io/applications :) We will know if the upgrade is smooth in ~20 mins :)

@alexmt alexmt merged commit 2802789 into argoproj:master Feb 26, 2020
@alexmt
Copy link
Collaborator

alexmt commented Feb 27, 2020

It works, the upgrade went smooth and ha Redis just works! Will try on an intrenal dogfood instance next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants