Redis Sentinel failure #10067

anrajme · 2022-05-07T10:53:16Z

Name and Version

bitnami/redis-12.3.2

What steps will reproduce the bug?

The Redis sentinel fails once in a while with the master pod (node-0) restarting because the cluster is caching the previous master's IP address in the configs. Slaves are also not part of the cluster at that stage, since it's looking for the old master's IP address.

myapp-dev-redis-node-0                           0/2     CrashLoopBackOff   8          8m16s
myapp-dev-redis-node-2                           2/2     Running            0          6h19m

%  kubectl -n mynamespace get pod myapp-dev-redis-node-0 -o wide
NAME                      READY   STATUS        RESTARTS   AGE     IP              NODE                                         NOMINATED NODE   READINESS GATES
myapp-dev-redis-node-0   0/2     Terminating   10         9m52s    172.16.103.44   vmss-agent-worker1-cluster-fvfdj00000r   <none>           <none>

Please note the IP of the master node and the IP in the Sentinel configs

127.0.0.1:26379> sentinel get-master-addr-by-name mymaster
1) "172.16.61.244"
2) "6379"

 % kubectl -n my-namespace logs pod/myapp-dev-redis-node-1 sentinel
1:X 05 May 2022 12:33:04.517 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 05 May 2022 12:33:04.517 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 05 May 2022 12:33:04.517 # Configuration loaded
1:X 05 May 2022 12:33:04.518 * Running mode=sentinel, port=26379.
1:X 05 May 2022 12:33:04.519 # Sentinel ID is 2c67cc10f694ca25d351152cdc9d88ab12a77eca
1:X 05 May 2022 12:33:04.519 # +monitor master mymaster 172.16.61.244 6379 quorum 2
1:X 06 May 2022 23:07:18.637 # +sdown master mymaster 172.16.61.244 6379
1:X 06 May 2022 23:07:18.637 # +sdown sentinel 1f1894ef8260108383a2b859789c49beb27cf766 172.16.61.244 26379 @ mymaster 172.16.61.244 6379
1:X 06 May 2022 23:07:19.656 # +sdown slave 172.16.61.200:6379 172.16.61.200 6379 @ mymaster 172.16.61.244 6379
1:X 06 May 2022 23:07:19.656 # +sdown sentinel fa9b14dbb31f6a78fe75b773dff4f1c6b919f1a7 172.16.61.200 26379 @ mymaster 172.16.61.244 6379

 %  kubectl -n mynamespace describe pod myapp-dev-redis-node-0
<REDUCTED>
Events:
  Type     Reason                  Age                   From                     Message
  ----     ------                  ----                  ----                     -------
  Normal   Scheduled               9m17s                 default-scheduler        Successfully assigned mynamespace/myapp-dev-redis-node-0 to vmss-agent-worker1-cluster-fvfdj00000r
  Warning  FailedAttachVolume      9m17s                 attachdetach-controller  Multi-Attach error for volume "pvc-ead119f8-6e61-4bf1-8130-8623aa449e37" Volume is already exclusively attached to one node and can't be attached to another
  Normal   SuccessfulAttachVolume  8m31s                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-ead119f8-6e61-4bf1-8130-8623aa449e37"
  Normal   Started                 8m11s                 kubelet                  Started container sentinel
  Normal   Pulled                  8m11s                 kubelet                  Container image "docker.io/bitnami/redis:6.0.9-debian-10-r66" already present on machine
  Normal   Created                 8m11s                 kubelet                  Created container redis
  Normal   Started                 8m11s                 kubelet                  Started container redis
  Normal   Pulled                  8m11s                 kubelet                  Container image "docker.io/bitnami/redis-sentinel:6.0.9-debian-10-r66" already present on machine
  Normal   Created                 8m11s                 kubelet                  Created container sentinel
  Warning  Unhealthy               7m42s (x5 over 8m2s)  kubelet                  Liveness probe failed:
Could not connect to Redis at localhost:26379: Connection refused
  Normal   Killing    7m42s  kubelet  Container sentinel failed liveness probe, will be restarted
  Warning  Unhealthy  7m37s  kubelet  Liveness probe failed:
Could not connect to Redis at localhost:6379: Connection refused
  Warning  Unhealthy  7m27s (x4 over 7m57s)  kubelet  Readiness probe failed:
Could not connect to Redis at localhost:6379: Connection refused
  Warning  Unhealthy  3m7s (x58 over 8m2s)  kubelet  Readiness probe failed:
Could not connect to Redis at localhost:26379: Connection refused

I understand this is a bit an outdated version of Redis but I don't see any changes related to this merged in the latest releases so I assume this would be a problem on the latest versions as well.

Also, how can I let sentinel use the hostname in the configs instead of caching the master's IP address? In this case, if the sentinel store the master host as "myapp-dev-redis-node-0" instead of "172.16.61.244", I hope this issue won't happen.

Are you using any custom parameters or values?

redis:
  enabled: true
  cluster:
    slaveCount: 3
  sentinel:
    enabled: true
    staticID: true
  networkPolicy:
    enabled: false
  usePassword: false
  metrics:
    enabled: false

What is the expected behavior?

Cluster working as expected.

What do you see instead?

Sentinel failing often.

Additional information

No response

The text was updated successfully, but these errors were encountered:

javsalgar · 2022-05-09T10:17:01Z

Hi,

I recall doing changes in the chart to workaround issues with the sentinel configuration. Could you check with the latest version of the chart? Do you have any steps to get to the issue in a consistent way?

github-actions · 2022-05-25T01:32:14Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions · 2022-05-31T01:33:28Z

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

anrajme mentioned this issue May 7, 2022

Redis version update StackStorm/stackstorm-k8s#305

Open

github-actions bot added the stale 15 days without activity label May 25, 2022

github-actions bot closed this as completed May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis Sentinel failure #10067

Redis Sentinel failure #10067

anrajme commented May 7, 2022 •

edited

Loading

javsalgar commented May 9, 2022

github-actions bot commented May 25, 2022

github-actions bot commented May 31, 2022

Redis Sentinel failure #10067

Redis Sentinel failure #10067

Comments

anrajme commented May 7, 2022 • edited Loading

Name and Version

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

Additional information

javsalgar commented May 9, 2022

github-actions bot commented May 25, 2022

github-actions bot commented May 31, 2022

anrajme commented May 7, 2022 •

edited

Loading