Skip to content
This repository has been archived by the owner on Oct 11, 2023. It is now read-only.

dev-spaces interferes with Redis #32

Closed
antogh opened this issue Oct 24, 2018 · 11 comments
Closed

dev-spaces interferes with Redis #32

antogh opened this issue Oct 24, 2018 · 11 comments

Comments

@antogh
Copy link

antogh commented Oct 24, 2018

Yesterday I installed dev-spaces for the first time on my AKS cluster (which I'm using for learning and experiments, not in production)

Right after the installation everything was fine, I could debug a containerized asp.net core app from my VS2017 directly on the kubernetes cluster. This asp.net core app reads a redis cache installed also in the cluster in the form of stateful set of 3 pods with 2 containers each (redis+sentinel) the 1st pod is a redis master, the other 2 are slaves. I have used this setup for 10 days and it was working fine.

Each evening I deallocate the VMs in the cluster and restart them the morning. Kubernetes takes care of restarting all the pods. It worked for 10 days before I installed dev-spaces.

This morning when I restarted the cluster VMs redis was not working. I had plenty of connections errors in the log. Master and slaves could not communicate anymore. I restarted the pods multiple time and even recreated the whole stateful set from scratch. Nothing to do.

I noticed that dev-spaces installed an additional container in the redis pods named mindaro-proxy and reading the logs I found this container was intercepeting and closing all the communications targeting the redis containers.

I then removed dev-spaces with az aks remove-dev-spaces command and recreated the redis stateful set + pods, this time they dont have the mindaro proxy in the pod and they work fine like before.

The debugging feature is great and saves me a lot of time but then I have this bad side effect. It would be great if this problem could be solved.

Thank you

@stepro
Copy link
Member

stepro commented Oct 24, 2018

Thanks for reporting this issue. Your analysis of the issue is correct and identifies a bug in the mindaro-proxy component. Do you have a Helm chart or raw Kubernetes yaml files you are using to install the specific redis cluster setup? This would help us greatly in recreating the problem on our side. Thanks!

@antogh
Copy link
Author

antogh commented Oct 24, 2018

Hi @stepro thanks , I write a quick answer right before entering... a meeting :(
you just need the stateful set to recreate the problem , if redis-0 isn't able to set a key to a value that means you have the problem. The problem show the 2nd time you restart the stateful set.

here is the yaml to create the redis stateful set

{
"kind": "StatefulSet",
"apiVersion": "apps/v1beta2",
"metadata": {
"name": "redis",
"namespace": "default",
"selfLink": "/apis/apps/v1beta2/namespaces/default/statefulsets/redis",
"uid": "23c3d74d-d7a1-11e8-a77b-ae2b0ed1f96f",
"resourceVersion": "3214234",
"generation": 1,
"creationTimestamp": "2018-10-24T15:26:12Z",
"labels": {
"app": "redis"
},
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{"apiVersion":"apps/v1beta1","kind":"StatefulSet","metadata":{"annotations":{},"name":"redis","namespace":"default"},"spec":{"replicas":3,"serviceName":"redis","template":{"metadata":{"labels":{"app":"redis"}},"spec":{"containers":[{"command":["sh","-c","source /redis-config/init.sh"],"image":"redis:4.0.11-alpine","name":"redis","ports":[{"containerPort":6379,"name":"redis"}],"volumeMounts":[{"mountPath":"/redis-config","name":"config"},{"mountPath":"/redis-data","name":"data"}]},{"command":["sh","-c","source /redis-config-src/sentinel.sh"],"image":"redis:4.0.11-alpine","name":"sentinel","volumeMounts":[{"mountPath":"/redis-config-src","name":"config"},{"mountPath":"/redis-config","name":"data"}]}],"volumes":[{"configMap":{"defaultMode":420,"name":"redis-config"},"name":"config"},{"emptyDir":null,"name":"data"}]}}}}\n"
}
},
"spec": {
"replicas": 3,
"selector": {
"matchLabels": {
"app": "redis"
}
},
"template": {
"metadata": {
"creationTimestamp": null,
"labels": {
"app": "redis"
}
},
"spec": {
"volumes": [
{
"name": "config",
"configMap": {
"name": "redis-config",
"defaultMode": 420
}
},
{
"name": "data",
"emptyDir": {}
}
],
"containers": [
{
"name": "redis",
"image": "redis:4.0.11-alpine",
"command": [
"sh",
"-c",
"source /redis-config/init.sh"
],
"ports": [
{
"name": "redis",
"containerPort": 6379,
"protocol": "TCP"
}
],
"resources": {},
"volumeMounts": [
{
"name": "config",
"mountPath": "/redis-config"
},
{
"name": "data",
"mountPath": "/redis-data"
}
],
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "IfNotPresent"
},
{
"name": "sentinel",
"image": "redis:4.0.11-alpine",
"command": [
"sh",
"-c",
"source /redis-config-src/sentinel.sh"
],
"resources": {},
"volumeMounts": [
{
"name": "config",
"mountPath": "/redis-config-src"
},
{
"name": "data",
"mountPath": "/redis-config"
}
],
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "IfNotPresent"
}
],
"restartPolicy": "Always",
"terminationGracePeriodSeconds": 30,
"dnsPolicy": "ClusterFirst",
"securityContext": {},
"schedulerName": "default-scheduler"
}
},
"serviceName": "redis",
"podManagementPolicy": "OrderedReady",
"updateStrategy": {
"type": "OnDelete"
},
"revisionHistoryLimit": 10
},
"status": {
"observedGeneration": 1,
"replicas": 3,
"readyReplicas": 3,
"currentReplicas": 3,
"currentRevision": "redis-5bd6f7877b",
"updateRevision": "redis-5bd6f7877b",
"collisionCount": 0
}
}

and here the config map

{
"kind": "ConfigMap",
"apiVersion": "v1",
"metadata": {
"name": "redis-config",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/configmaps/redis-config",
"uid": "e147303d-cbd5-11e8-9b5c-6e6eccc149a1",
"resourceVersion": "1625941",
"creationTimestamp": "2018-10-09T15:13:30Z",
"annotations": {
"kubectl.kubernetes.io/last-applied-configuration": "{"apiVersion":"v1","data":{"init.sh":"#!/bin/bash\nif [[ ${HOSTNAME} == 'redis-0' ]]\nthen\n redis-server /redis-config/master.conf\nelse\n redis-server /redis-config/slave.conf\nfi","master.conf":"bind 0.0.0.0\nport 6379\n\ndir /redis-data","sentinel.conf":"bind 0.0.0.0\nport 26379\n\nsentinel monitor redis redis-0.redis 6379 2\nsentinel parallel-syncs redis 1\nsentinel down-after-milliseconds redis 10000\nsentinel failover-timeout redis 20000","sentinel.sh":"#!/bin/bash\ncp /redis-config-src/. /redis-config\nwhile ! ping -c 1 redis-0.redis; do\n echo 'Waiting for server'\n sleep 1\ndone\n\nredis-sentinel /redis-config/sentinel.conf","slave.conf":"bind 0.0.0.0\nport 6379\n\ndir .\n\nslaveof redis-0.redis 6379"},"kind":"ConfigMap","metadata":{"annotations":{},"creationTimestamp":null,"name":"redis-config","namespace":"default"}}\n"
}
},
"data": {
"init.sh": "#!/bin/bash\nif [[ ${HOSTNAME} == 'redis-0' ]]\nthen\n redis-server /redis-config/master.conf\nelse\n redis-server /redis-config/slave.conf\nfi",
"master.conf": "bind 0.0.0.0\nport 6379\n\ndir /redis-data",
"sentinel.conf": "bind 0.0.0.0\nport 26379\n\nsentinel monitor redis redis-0.redis 6379 2\nsentinel parallel-syncs redis 1\nsentinel down-after-milliseconds redis 10000\nsentinel failover-timeout redis 20000",
"sentinel.sh": "#!/bin/bash\ncp /redis-config-src/. /redis-config\nwhile ! ping -c 1 redis-0.redis; do\n echo 'Waiting for server'\n sleep 1\ndone\n\nredis-sentinel /redis-config/sentinel.conf",
"slave.conf": "bind 0.0.0.0\nport 6379\n\ndir .\n\nslaveof redis-0.redis 6379"
}
}

@stepro
Copy link
Member

stepro commented Oct 24, 2018

Thanks, I'll take a look.

@YuzorMa YuzorMa changed the title dev-spaces interfers with Redis dev-spaces interferes with Redis Oct 25, 2018
@stepro
Copy link
Member

stepro commented Oct 25, 2018

Thanks @antogh for your patience on this issue. I needed to create a headless service object and fix a problem in the sentinel.sh script (the cp /redis-config-src/. /redis-config command didn't work; it needed to be cp /redis-config-src/* /redis-config) to get to the point where I could reproduce the issue.

Unfortunately, the issue here is a general problem that occurs when injecting any kind of intercepting proxy such as for dev spaces or other solutions like istio. I believe the specific problem is that when an intended slave (e.g. redis-1) connects to its master (e.g. redis-0) to register itself as a slave, the master uses the getpeername() API to determine the IP and port of the slave. When this is done through an intercepting proxy, this IP and port always end up being the master's IP and port, and the master then ends up turning itself into a slave. This causes the whole system to be stuck in the initialization phase.

The closest related issue I could find was this one for istio, where you'll notice the attached yaml files already disable the istio sidecar from the master and slave pods using a special istio annotation. The Helm chart did not generate these annotations so I'm not sure how it was determined that istio needed to be disabled for these pods. The actual issue here looks to be some problem with istio still getting in the way when it was told to get out of the way.

For dev spaces, we do not currently have a mechanism for a pod to opt out of being instrumented for dev spaces with the sidecar proxy. Your best option would be to run the redis cache in a different Kubernetes namespace that has not been upgraded to a dev space. We will look into providing a label or annotation similar to istio that will allow you to opt out of the sidecar proxy for specific pods.

@antogh
Copy link
Author

antogh commented Oct 25, 2018

Thanks @stepro
I read your message with interest, it makes total sense and correspond to what I found out.

I have tried some hacks to have the the redis pod to opt out from the mindaro-proxy, unfortunately kubernetes does not allow removing a container from a pod updating its yaml, so I tried changing the image name to a neutral "alpine" image, and it was working for some time (redis log shows a successful initialization), but then the aks agent notice the hash for the mindaro-proxy container has changed and restarts the whole pod causing an infinite crash back loop :(

In the end I came to the same conclusion you suggested: placing redis pods into a different namespace not affected by dev spaces. Redis works fine again now.

Bu unfortunately problem never ends. Now VS does not debug anymore with azure dev spaces. It worked fine the 1st time I tried, now doesn't work anymore. I removed completely redis and the new namespace but the problem persists. VS is able to create the SVC and DEPLOYMENT on the cluster but then fails (after 10 minutes of silence) to create the POD with the actual application that would be port forwarded to my local machine. It seems a communication problem. VS can create the container locally without problem, so it's not a local docker issue, it can't send the container image to the cluster into the pod.

Do you have any idea what could be? The remote debugging inside kubernetes is really precious to speed up development, I'd really like to use this feature.

BTW I opened another issue here about this problem.

Thanks again

@stepro
Copy link
Member

stepro commented Oct 25, 2018

I just discovered this article and will be investigating if there is anything we can do to make this scenario work.

Thanks for opening the other issue - someone on the team familiar with these connectivity issues will be able to help you.

@antogh
Copy link
Author

antogh commented Oct 26, 2018

@stepro
Interesting article.

However, after one day of pain, I’m very happy with the setup I have now, it works like a charm.
I have created a dedicated namespace for dev spaces and installed it there. Now it cannot interfere anymore with other pods. In the while the communication problems I mentioned yesterday are solved (it seems some maintenance was going on west Europe area where my AKS is) and I can debug flawlessly from my custom namespace and interact from my app under debugging with all the other pods in other namespaces.

Allow me to give you a suggestion: I would write a disclaimer in the dev spaces doc here:
https://docs.microsoft.com/en-us/azure/dev-spaces/get-started-netcore-visualstudio
https://docs.microsoft.com/en-us/azure/dev-spaces/troubleshooting

something like:
This service is still in preview and we are continuously working for improvements. At current stage the proxy agent that allow the remote debugging might interfere with some other pods in the same namespace (we know this happen with redis master/slave stateful sets). If you encounter this problem please install az dev spaces in a dedicated namespace, separate from the other pods.

lisaguthrie added a commit to lisaguthrie/azure-docs that referenced this issue Oct 29, 2018
@lisaguthrie
Copy link
Collaborator

Thanks @antogh - I've submitted a request to get this added to our troubleshooting documentation.

@AceHack
Copy link

AceHack commented Nov 9, 2018

Please add an annotation to disable as soon as possible, that will be a great feature.

@stepro
Copy link
Member

stepro commented Nov 9, 2018

We've checked in an ability to disable and it should be available in a couple of weeks.

@YuzorMa
Copy link
Contributor

YuzorMa commented Jan 15, 2019

This should be fixed in the latest versions of Dev Spaces. Please let us know if you continue to see issues.

@YuzorMa YuzorMa closed this as completed Jan 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants