New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flannel started ir rkt fails to contact AWS endpoints for aws-vpc #1436

Closed
monder opened this Issue Jul 3, 2016 · 4 comments

Comments

Projects
None yet
4 participants
@monder

monder commented Jul 3, 2016

Issue Report

#1302 #1429

Bug

flannel fails to start in rkt container with aws-vpc configuration.
It seems that some of the aws API's require some specific image configuration:

caused by: Post https://ec2.eu-west-1.amazonaws.com/: dial tcp: lookup ec2.eu-west-1.amazonaws.com: no such host

If I just add resolv.conf (as per rkt/#2141) the error will be:

caused by: Post https://ec2.eu-west-1.amazonaws.com/: x509: failed to load system roots and no roots provided

CoreOS Version

$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1097.0.0
VERSION_ID=1097.0.0
BUILD_ID=2016-07-02-0145
PRETTY_NAME="CoreOS 1097.0.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

AWS

etcdctl get /coreos.com/network/config
{"Network":"10.4.0.0/16","Backend":{"Type":"aws-vpc"}}

Expected Behavior

flannel starts and correctly registers a new subnet.

Actual Behavior

flannel does not start:

Jul 03 15:10:39 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:39 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:40 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:40 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: E0703 15:10:40.186100 02262 network.go:71] Failed to initialize network  (type aws-vpc): error getting instance info: RequestError: send request failed
Jul 03 15:10:40 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: caused by: Post https://ec2.eu-west-1.amazonaws.com/: dial tcp: lookup ec2.eu-west-1.amazonaws.com: no such host
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: I0703 15:10:41.195294 02262 etcd.go:129] Found lease (10.4.102.0/24) for current IP (10.0.1.220), reusing
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: I0703 15:10:41.204110 02262 etcd.go:84] Subnet lease acquired: 10.4.102.0/24
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: I0703 15:10:41.506790 02262 awsvpc.go:100] Warning- disabling source destination check failed: RequestError: send request failed
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: caused by: Post https://ec2.eu-west-1.amazonaws.com/: dial tcp: lookup ec2.eu-west-1.amazonaws.com: no such host
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: I0703 15:10:41.508516 02262 awsvpc.go:104] RouteTableID not passed as config parameter, detecting ...
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: E0703 15:10:41.863974 02262 network.go:71] Failed to initialize network  (type aws-vpc): error getting instance info: RequestError: send request failed
Jul 03 15:10:41 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: caused by: Post https://ec2.eu-west-1.amazonaws.com/: dial tcp: lookup ec2.eu-west-1.amazonaws.com: no such host
Jul 03 15:10:42 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: I0703 15:10:42.870792 02262 etcd.go:129] Found lease (10.4.102.0/24) for current IP (10.0.1.220), reusing
Jul 03 15:10:42 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: I0703 15:10:42.888657 02262 etcd.go:84] Subnet lease acquired: 10.4.102.0/24
Jul 03 15:10:42 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:42 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:43 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:43 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: I0703 15:10:43.191013 02262 awsvpc.go:100] Warning- disabling source destination check failed: RequestError: send request failed
Jul 03 15:10:43 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: caused by: Post https://ec2.eu-west-1.amazonaws.com/: dial tcp: lookup ec2.eu-west-1.amazonaws.com: no such host
Jul 03 15:10:43 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: I0703 15:10:43.192719 02262 awsvpc.go:104] RouteTableID not passed as config parameter, detecting ...
Jul 03 15:10:43 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:43 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
Jul 03 15:10:43 ip-10-0-1-220.eu-west-1.compute.internal rkt[2262]: 0x444540
$ rkt list
9fe82244    flannel quay.io/coreos/flannel:0.5.5            exited  46 minutes ago  46 minutes ago
9ffeb968    flannel quay.io/coreos/flannel:0.5.5            exited  4 minutes ago   4 minutes ago
a29eb8e1    flannel quay.io/coreos/flannel:0.5.5            exited  35 minutes ago  35 minutes ago
a49cf500    flannel quay.io/coreos/flannel:0.5.5            exited  40 minutes ago  40 minutes ago
aab64bd5    flannel quay.io/coreos/flannel:0.5.5            exited  22 minutes ago  22 minutes ago
@monder

This comment has been minimized.

monder commented Jul 3, 2016

Adding both:

--volume=resolv,kind=host,source=/etc/resolv.conf,readOnly=true \
--mount volume=resolv,target=/etc/resolv.conf \
--volume cacert,kind=host,source=/etc/ssl/certs/ca-certificates.crt,readOnly=true \
--mount volume=cacert,target=/etc/ssl/certs/ca-certificates.crt \

seems to fix the issue. But I suppose ca-certificates generation should be an image buildstep.
Here is the full workaround:
/etc/systemd/system/flanneld.service.d/99-fix-flannel.conf

[Service]
ExecStart=
ExecStart=/usr/bin/rkt run --net=host \
   --stage1-path=/usr/lib/rkt/stage1-images/stage1-fly.aci \
   --insecure-options=image \
   --set-env=NOTIFY_SOCKET=/run/systemd/notify \
   --inherit-env=true \
   --volume runsystemd,kind=host,source=/run/systemd,readOnly=false \
   --volume runflannel,kind=host,source=/run/flannel,readOnly=false \
   --volume ssl,kind=host,source=${ETCD_SSL_DIR},readOnly=true \
   --volume=resolv,kind=host,source=/etc/resolv.conf,readOnly=true \
   --mount volume=resolv,target=/etc/resolv.conf \
   --volume cacert,kind=host,source=/etc/ssl/certs/ca-certificates.crt,readOnly=true \
   --mount volume=cacert,target=/etc/ssl/certs/ca-certificates.crt \
   --mount volume=runsystemd,target=/run/systemd \
   --mount volume=runflannel,target=/run/flannel \
   --mount volume=ssl,target=${ETCD_SSL_DIR} \
   ${FLANNEL_IMG}:${FLANNEL_VER} \
   -- --ip-masq=true
@mischief

This comment has been minimized.

mischief commented Jul 3, 2016

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Jul 5, 2016

Looks like rkt doesn't overlay /etc/resolv.conf like docker does, hence the issue with contacting aws endpoints. The ssl issues were caused by me missing a line when converting the early-docker version to rkt. Should be fixed here: coreos/coreos-overlay#2043

@ajeddeloh

This comment has been minimized.

ajeddeloh commented Jul 6, 2016

Closed via coreos/coreos-overlay#2043. Will be included in next alpha

@ajeddeloh ajeddeloh closed this Jul 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment