New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flanneld.service does not use /etc/hosts anymore #1565

Closed
tyranron opened this Issue Sep 12, 2016 · 0 comments

Comments

@tyranron

tyranron commented Sep 12, 2016

Issue Report

Bug

After upgrading to 1122.2.0 CoreOS version, flanneld.service does not see /etc/hosts anymore, so it's impossible to specify etcd cluster nodes via their aliases, only by IPs or real domains.
But on 1068.10.0 it worked fine.

CoreOS Version

$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1122.2.0
VERSION_ID=1122.2.0
BUILD_ID=2016-09-06-1449
PRETTY_NAME="CoreOS 1122.2.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

Digital Ocean 2 GB / 40 GB Disk / AMS3 - CoreOS 1122.2.0 (stable)

Having /etc/hosts like:

127.0.0.1 cluster-node-01
178.62.235.191 cluster-node-02

And cloud-config section ilke:

...
coreos:
  ...
  flannel:
    ...
    etcd_endpoints: http://cluster-node-01:4001,https://cluster-node-02:2379
    ...
  ...
...

Expected Behavior

Works fine.

Actual Behavior

Fails with following error in systemctl status flanneld:

● flanneld.service - Network fabric for containers
   Loaded: loaded (/usr/lib64/systemd/system/flanneld.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/flanneld.service.d
           └─50-network-config.conf
   Active: activating (start) since Mon 2016-09-12 06:26:35 UTC; 24s ago
     Docs: https://github.com/coreos/flannel
  Process: 9212 ExecStartPre=/usr/bin/etcdctl set /flannel/network/config {"Network":"10.2.0.0/16","Backend":{"Type":"vxlan"}} (code=exited, status=0/SUCCESS)
  Process: 9210 ExecStartPre=/usr/bin/mkdir -p ${ETCD_SSL_DIR} (code=exited, status=0/SUCCESS)
  Process: 9207 ExecStartPre=/usr/bin/mkdir -p /run/flannel (code=exited, status=0/SUCCESS)
  Process: 9196 ExecStartPre=/sbin/modprobe ip_tables (code=exited, status=0/SUCCESS)
 Main PID: 9223 (flanneld)
    Tasks: 5
   Memory: 15.8M
      CPU: 1.222s
   CGroup: /system.slice/flanneld.service
           └─9223 /opt/bin/flanneld --ip-masq=true

Sep 12 06:26:50 cluster-node-01 rkt[9223]: E0912 06:26:50.680935 09223 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Sep 12 06:26:51 cluster-node-01 rkt[9223]: E0912 06:26:51.691050 09223 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
...

Other Information

etcd cluster is reachable and healthy.
etcd cluster-health:

member 8116a2a3963d30ee is healthy: got healthy result from https://cluster-node-02:2379
member aa5bbeea51d0359f is healthy: got healthy result from https://cluster-node-01:2379
cluster is healthy

fleetctl list-machines:

MACHINE     IP      METADATA
94ec36b1... 178.62.235.191  hostname=cluster-node-02
c89a4bb7... 178.62.235.13   hostname=cluster-node-01

Changing section in cloud-config to:

...
coreos:
  ...
  flannel:
    ...
    etcd_endpoints: http://127.0.0.1:4001,https://178.62.235.191:2379
    ...
  ...
...

fixes the problem and everything is OK.

Having a look at systemctl cat flanneld we see next:

...
ExecStart=/usr/bin/rkt run --net=host \
   --stage1-path=/usr/lib/rkt/stage1-images/stage1-fly.aci \
   --insecure-options=image \
   --set-env=NOTIFY_SOCKET=/run/systemd/notify \
   --inherit-env=true \
   --volume runsystemd,kind=host,source=/run/systemd,readOnly=false \
   --volume runflannel,kind=host,source=/run/flannel,readOnly=false \
   --volume ssl,kind=host,source=${ETCD_SSL_DIR},readOnly=true \
   --volume certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
   --volume resolv,kind=host,source=/etc/resolv.conf,readOnly=true \
   --mount volume=runsystemd,target=/run/systemd \
   --mount volume=runflannel,target=/run/flannel \
   --mount volume=ssl,target=${ETCD_SSL_DIR} \
   --mount volume=certs,target=/etc/ssl/certs \
   --mount volume=resolv,target=/etc/resolv.conf \
   ${FLANNEL_IMG}:${FLANNEL_VER} \
   --exec /opt/bin/flanneld \
   -- --ip-masq=true
...

so no mounting of /etc/hosts is performed, that's why endpoints cannot be resolved by their aliases which is quite obvious.

Proposals to fix

  1. Just mount /etc/hosts to container.

  2. Provide this capability via some env var like RKT_OPTS in flanneld.service declaration, so if anybody need additional mounts or parameters for flanneld it can be easily done via drop-in file like:

    Environment="RKT_OPTS=--volume=etc-hosts,kind=host,source=/etc/hosts --mount volume=etc-hosts,target=/etc/hosts"
    
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment