Skip to content
This repository has been archived by the owner. It is now read-only.

flanneld.service does not use /etc/hosts anymore #1565

Closed
tyranron opened this issue Sep 12, 2016 · 0 comments
Closed

flanneld.service does not use /etc/hosts anymore #1565

tyranron opened this issue Sep 12, 2016 · 0 comments

Comments

@tyranron
Copy link

@tyranron tyranron commented Sep 12, 2016

Issue Report

Bug

After upgrading to 1122.2.0 CoreOS version, flanneld.service does not see /etc/hosts anymore, so it's impossible to specify etcd cluster nodes via their aliases, only by IPs or real domains.
But on 1068.10.0 it worked fine.

CoreOS Version

$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1122.2.0
VERSION_ID=1122.2.0
BUILD_ID=2016-09-06-1449
PRETTY_NAME="CoreOS 1122.2.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

Digital Ocean 2 GB / 40 GB Disk / AMS3 - CoreOS 1122.2.0 (stable)

Having /etc/hosts like:

127.0.0.1 cluster-node-01
178.62.235.191 cluster-node-02

And cloud-config section ilke:

...
coreos:
  ...
  flannel:
    ...
    etcd_endpoints: http://cluster-node-01:4001,https://cluster-node-02:2379
    ...
  ...
...

Expected Behavior

Works fine.

Actual Behavior

Fails with following error in systemctl status flanneld:

● flanneld.service - Network fabric for containers
   Loaded: loaded (/usr/lib64/systemd/system/flanneld.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/flanneld.service.d
           └─50-network-config.conf
   Active: activating (start) since Mon 2016-09-12 06:26:35 UTC; 24s ago
     Docs: https://github.com/coreos/flannel
  Process: 9212 ExecStartPre=/usr/bin/etcdctl set /flannel/network/config {"Network":"10.2.0.0/16","Backend":{"Type":"vxlan"}} (code=exited, status=0/SUCCESS)
  Process: 9210 ExecStartPre=/usr/bin/mkdir -p ${ETCD_SSL_DIR} (code=exited, status=0/SUCCESS)
  Process: 9207 ExecStartPre=/usr/bin/mkdir -p /run/flannel (code=exited, status=0/SUCCESS)
  Process: 9196 ExecStartPre=/sbin/modprobe ip_tables (code=exited, status=0/SUCCESS)
 Main PID: 9223 (flanneld)
    Tasks: 5
   Memory: 15.8M
      CPU: 1.222s
   CGroup: /system.slice/flanneld.service
           └─9223 /opt/bin/flanneld --ip-masq=true

Sep 12 06:26:50 cluster-node-01 rkt[9223]: E0912 06:26:50.680935 09223 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
Sep 12 06:26:51 cluster-node-01 rkt[9223]: E0912 06:26:51.691050 09223 network.go:53] Failed to retrieve network config: client: etcd cluster is unavailable or misconfigured
...

Other Information

etcd cluster is reachable and healthy.
etcd cluster-health:

member 8116a2a3963d30ee is healthy: got healthy result from https://cluster-node-02:2379
member aa5bbeea51d0359f is healthy: got healthy result from https://cluster-node-01:2379
cluster is healthy

fleetctl list-machines:

MACHINE     IP      METADATA
94ec36b1... 178.62.235.191  hostname=cluster-node-02
c89a4bb7... 178.62.235.13   hostname=cluster-node-01

Changing section in cloud-config to:

...
coreos:
  ...
  flannel:
    ...
    etcd_endpoints: http://127.0.0.1:4001,https://178.62.235.191:2379
    ...
  ...
...

fixes the problem and everything is OK.

Having a look at systemctl cat flanneld we see next:

...
ExecStart=/usr/bin/rkt run --net=host \
   --stage1-path=/usr/lib/rkt/stage1-images/stage1-fly.aci \
   --insecure-options=image \
   --set-env=NOTIFY_SOCKET=/run/systemd/notify \
   --inherit-env=true \
   --volume runsystemd,kind=host,source=/run/systemd,readOnly=false \
   --volume runflannel,kind=host,source=/run/flannel,readOnly=false \
   --volume ssl,kind=host,source=${ETCD_SSL_DIR},readOnly=true \
   --volume certs,kind=host,source=/usr/share/ca-certificates,readOnly=true \
   --volume resolv,kind=host,source=/etc/resolv.conf,readOnly=true \
   --mount volume=runsystemd,target=/run/systemd \
   --mount volume=runflannel,target=/run/flannel \
   --mount volume=ssl,target=${ETCD_SSL_DIR} \
   --mount volume=certs,target=/etc/ssl/certs \
   --mount volume=resolv,target=/etc/resolv.conf \
   ${FLANNEL_IMG}:${FLANNEL_VER} \
   --exec /opt/bin/flanneld \
   -- --ip-masq=true
...

so no mounting of /etc/hosts is performed, that's why endpoints cannot be resolved by their aliases which is quite obvious.

Proposals to fix

  1. Just mount /etc/hosts to container.

  2. Provide this capability via some env var like RKT_OPTS in flanneld.service declaration, so if anybody need additional mounts or parameters for flanneld it can be easily done via drop-in file like:

    Environment="RKT_OPTS=--volume=etc-hosts,kind=host,source=/etc/hosts --mount volume=etc-hosts,target=/etc/hosts"
    
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants