Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

Compatibility with Calico / Canal CNI networking #442

Closed
felixbuenemann opened this issue Aug 11, 2016 · 12 comments
Closed

Compatibility with Calico / Canal CNI networking #442

felixbuenemann opened this issue Aug 11, 2016 · 12 comments
Assignees
Milestone

Comments

@felixbuenemann
Copy link
Contributor

I tried to switch my coreos-kubernetes 1.3.4 / deis workflow 2.3.0 based cluster to use calico networking today.

This worked fine for the whole cluster, except for Dockerfile builds, which had problem with network connectivity.

My guess would be that the docker-in-docker does not use the proper cni network plugin.

@aboyett
Copy link

aboyett commented Aug 18, 2016

I hit this issue to running on a k8s 1.3.4 cluster in AWS w/ calico networking and workflow v2.3.0. I couldn't deploy applications via a git push deis or deis pull. It looks like the issue stems from an inability to access the deis-registry-proxy via the host networking when calico is used. Once I reconfigured the cluster to use flannel the issue disappeared.

I don't have time to further debug this at the moment but feel free to ping me if you need a test subject when trying to resolve this issue.

For reference I've included the errors I saw in this scenario.

From my local system:

$ deis pull deis/example-go -a frozen-jumpsuit
Creating build... Error: Unknown Error (400): {"detail":"Put http://localhost:5555/v1/repositories/frozen-jumpsuit/: dial tcp 127.0.0.1:5555: getsockopt: connection refused"}

From the deis controller:

172.29.72.5 "POST /v2/hooks/push HTTP/1.1" 201 380 "deis-builder"
172.29.72.5 "POST /v2/hooks/config HTTP/1.1" 200 222 "deis-builder"
INFO [frozen-jumpsuit]: config frozen-jumpsuit-092f71c updated
INFO [frozen-jumpsuit]: andy created initial release
INFO [frozen-jumpsuit]: domain frozen-jumpsuit added
172.29.72.6 "POST /v2/apps/ HTTP/1.1" 201 166 "Deis Client vv2.4.0"
INFO [frozen-jumpsuit]: build frozen-jumpsuit-5d50cc4 created
INFO [frozen-jumpsuit]: andy deployed deis/example-go
INFO Pulling Docker image deis/example-go:latest
INFO Tagging Docker image deis/example-go:latest as localhost:5555/frozen-jumpsuit:v2
INFO Pushing Docker image localhost:5555/frozen-jumpsuit:v2
INFO [frozen-jumpsuit]: Put http://localhost:5555/v1/repositories/frozen-jumpsuit/: dial tcp 127.0.0.1:5555: getsockopt: connection refused
ERROR:root:Put http://localhost:5555/v1/repositories/frozen-jumpsuit/: dial tcp 127.0.0.1:5555: getsockopt: connection refused
Traceback (most recent call last):
  File "/app/api/models/release.py", line 88, in new
    release.publish()
  File "/app/api/models/release.py", line 135, in publish
    publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
  File "/app/registry/dockerclient.py", line 196, in publish_release
    return DockerClient().publish_release(source, target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 117, in publish_release
    self.push("{}/{}".format(self.registry, name), tag)
  File "/app/registry/dockerclient.py", line 133, in push
    log_output(stream, 'push', repo, tag)
  File "/app/registry/dockerclient.py", line 175, in log_output
    stream_error(chunk, operation, repo, tag)
  File "/app/registry/dockerclient.py", line 192, in stream_error
    raise RegistryException(message)
registry.dockerclient.RegistryException: Put http://localhost:5555/v1/repositories/frozen-jumpsuit/: dial tcp 127.0.0.1:5555: getsockopt: connection refused
​
The above exception was the direct cause of the following exception:
​
Traceback (most recent call last):
  File "/app/api/models/build.py", line 62, in create
    source_version=self.version
  File "/app/api/models/release.py", line 95, in new
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/frozen-jumpsuit/: dial tcp 127.0.0.1:5555: getsockopt: connection refused
​
The above exception was the direct cause of the following exception:
​
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 463, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 176, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 251, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 71, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/frozen-jumpsuit/: dial tcp 127.0.0.1:5555: getsockopt: connection refused
172.29.72.6 "POST /v2/apps/frozen-jumpsuit/builds/ HTTP/1.1" 400 128 "Deis Client vv2.4.0"
INFO GET http://172.28.18.182:80/logs/hello-world?log_lines=100 returned a 204 status code
172.29.72.6 "GET /v2/apps/hello-world/logs HTTP/1.1" 204 - "Deis Client vv2.4.0"

@bacongobbler
Copy link
Member

bacongobbler commented Sep 19, 2016

So it looks like calico networking disallows the host from communicating to the pod IPs. That is an interesting assumption that we were not aware of. We assumed that service IPs are available across the entire cluster, both for pods and for hosts.

I'm not sure what the resolution would be here. Perhaps there's some way to allow worker nodes to communicate with service IPs with calico networking enabled?

@felixbuenemann
Copy link
Contributor Author

@bacongobbler Have you set up a k8s cluster with canal networking to check what's going on? (You can do that with kube-aws)

@bacongobbler
Copy link
Member

I have not, however I believe I recall a little birdy inside the Deis org or in #community in Slack reaching out and we debugged their networking issues, and as it turned out it was due to host -> pod networking "failing" using Calico/canal networking.

@felixbuenemann
Copy link
Contributor Author

felixbuenemann commented Oct 19, 2016

I am seeing the same behavior on a kube-aws / k8s 1.4.1 cluster with workflow v2.7.0 and flanel.

Deploying buildpack apps works, but deis pull deis/example-go -a quacky-crawfish fails with:

INFO [quacky-crawfish]: config quacky-crawfish-9aec233 updated
10.2.15.11 "GET /v2/apps/?limit=100 HTTP/1.1" 200 223 "Deis Client v2.7.0"
INFO [quacky-crawfish]: build quacky-crawfish-34a2219 created
INFO [quacky-crawfish]: buenemann deployed deis/example-go
INFO Pulling Docker image deis/example-go:latest
INFO Tagging Docker image deis/example-go:latest as localhost:5555/quacky-crawfish:v4
INFO Pushing Docker image localhost:5555/quacky-crawfish:v4
INFO Pushing Docker image localhost:5555/quacky-crawfish:v4
INFO Pushing Docker image localhost:5555/quacky-crawfish:v4
INFO [quacky-crawfish]: Put http://localhost:5555/v1/repositories/quacky-crawfish/: dial tcp 127.0.0.1:5555: getsockopt: connection refused
ERROR:root:Put http://localhost:5555/v1/repositories/quacky-crawfish/: dial tcp 127.0.0.1:5555: getsockopt: connection refused
Traceback (most recent call last):
  File "/app/api/models/release.py", line 89, in new
    release.publish()
  File "/app/api/models/release.py", line 136, in publish
    publish_release(source_image, self.image, deis_registry, self.get_registry_auth())
  File "/app/registry/dockerclient.py", line 195, in publish_release
    return DockerClient().publish_release(source, target, deis_registry, creds)
  File "/app/registry/dockerclient.py", line 114, in publish_release
    self.push("{}/{}".format(self.registry, name), tag)
  File "/usr/local/lib/python3.5/dist-packages/backoff.py", line 286, in retry
    ret = target(*args, **kwargs)
  File "/app/registry/dockerclient.py", line 131, in push
    log_output(stream, 'push', repo, tag)
  File "/app/registry/dockerclient.py", line 174, in log_output
    stream_error(chunk, operation, repo, tag)
  File "/app/registry/dockerclient.py", line 191, in stream_error
    raise RegistryException(message)
registry.dockerclient.RegistryException: Put http://localhost:5555/v1/repositories/quacky-crawfish/: dial tcp 127.0.0.1:5555: getsockopt: connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/api/models/build.py", line 62, in create
    source_version=self.version
  File "/app/api/models/release.py", line 96, in new
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/quacky-crawfish/: dial tcp 127.0.0.1:5555: getsockopt: connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/views.py", line 471, in dispatch
    response = handler(request, *args, **kwargs)
  File "/app/api/views.py", line 181, in create
    return super(AppResourceViewSet, self).create(request, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/rest_framework/mixins.py", line 21, in create
    self.perform_create(serializer)
  File "/app/api/viewsets.py", line 21, in perform_create
    self.post_save(obj)
  File "/app/api/views.py", line 258, in post_save
    self.release = build.create(self.request.user)
  File "/app/api/models/build.py", line 78, in create
    raise DeisException(str(e)) from e
api.exceptions.DeisException: Put http://localhost:5555/v1/repositories/quacky-crawfish/: dial tcp 127.0.0.1:5555: getsockopt: connection refused
10.2.15.11 "POST /v2/apps/quacky-crawfish/builds/ HTTP/1.1" 400 128 "Deis Client v2.7.0"

@bacongobbler
Copy link
Member

@felixbuenemann @jdumars and I were actually talking about this in PM on slack. Can you try to jump onto one of the worker nodes and determine if there is something listening on port 5555 on the host? In our debugging we found that for some reason kubernetes did not allocate port 5555 on the host for the registry-proxy pod, which would explain why the connection is being refused.

@bacongobbler
Copy link
Member

bacongobbler commented Oct 19, 2016

Also note that this seems to be CoreOS-specific. We're not seeing this issue on other providers using kube-up (Vagrant, AWS and GKE) or minikube, which are the 4 we are using according to our release test matrix. Vagrant uses Fedora, AWS uses Ubuntu, GKE uses Debian and Minikube uses a custom ISO.

@felixbuenemann
Copy link
Contributor Author

@bacongobbler I ran netstat -tan |grep 5555 and it returns nothing, so the port doesn't seem to be open.

@aboyett
Copy link

aboyett commented Oct 19, 2016

I think I was the little birdy @bacongobbler mentioned. My investigations led me to this upstream issue that appeared to be relevant: projectcalico/k8s-exec-plugin#52 (comment) I planned on investigating the issue in more depth but haven't got a chance to mess with calico/canal networking again

@kmala
Copy link
Contributor

kmala commented Oct 19, 2016

i think this could be related to this issue kubernetes/kubernetes#31307 where there are issue with host port using the cni plugin

@bacongobbler
Copy link
Member

Another related issue upstream would be kubernetes/kubernetes#23920, so it does seem like host ports on some providers (like CoreOS) doesn't seem to be working.

@bacongobbler
Copy link
Member

Since we've seem to have found the root issue, I'm going to close this as a duplicate of deis/registry#64, which I've posted a patch that users can try to bypass the issue here. Thank you everyone for all your help!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants