Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-resolved double suffix on boot, transient. #2606

Open
aminasyan opened this issue Aug 8, 2019 · 4 comments

Comments

@aminasyan
Copy link

commented Aug 8, 2019

Issue Report

On boot CoreOS fails to mount NFS share in /etc/fstab (have tried with systemd.mount)
Reason given is "Failed to resolve server"

Upon examination of DNS packets discovered DNS suffix is appended twice (nfs.example.com.example.com)

Bug

Container Linux Version

coreos ~ # cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2135.6.0
VERSION_ID=2135.6.0
BUILD_ID=2019-07-30-0722
PRETTY_NAME="Container Linux by CoreOS 2135.6.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
coreos ~ #
...
BUG_REPORT_URL="https://issues.coreos.com"

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?
local hardware system, Supermicro SuperServer 6016TT-IBQF

Expected Behavior

NFS mount works on boot.

Actual Behavior

NFS mount fails on boot

Reproduction Steps

  1. Configure NFS mount on boot via fstab or systemd.mount
  2. ...

Other Information

Logs.

coreos ~ # journalctl -e
Aug 08 18:38:46 coreos.example.com systemd-networkd-wait-online[706]: ignoring: lo
Aug 08 18:38:46 coreos.example.com systemd-networkd-wait-online[706]: ignoring: lo
Aug 08 18:38:46 coreos.example.com systemd-networkd-wait-online[706]: ignoring: lo
Aug 08 18:38:46 coreos.example.com systemd-timesyncd[735]: Network configuration changed, trying to establish connection.
Aug 08 18:38:46 coreos.example.com systemd[1]: rkt-gc.service: Succeeded.
Aug 08 18:38:46 coreos.example.com systemd[1]: Started Garbage Collection for rkt.
Aug 08 18:38:46 coreos.example.com update_engine[763]: I0808 18:38:46.914912 763 main.cc:89] CoreOS Update Engine starting
Aug 08 18:38:46 coreos.example.com systemd[1]: Started Update Engine.
Aug 08 18:38:46 coreos.example.com systemd[1]: Started Cluster reboot manager.
Aug 08 18:38:46 coreos.example.com update_engine[763]: I0808 18:38:46.952653 763 update_check_scheduler.cc:74] Next update chec>
Aug 08 18:38:47 coreos.example.com locksmithd[808]: Reboot strategy is "off" - locksmithd is exiting.
Aug 08 18:38:47 coreos.example.com systemd[1]: locksmithd.service: Succeeded.
Aug 08 18:38:48 coreos.example.com systemd-networkd[620]: enp1s0f0: Gained IPv6LL
Aug 08 18:39:00 coreos.example.com systemd-networkd[620]: enp1s0f0: Configured
Aug 08 18:39:00 coreos.example.com systemd-networkd-wait-online[706]: ignoring: lo
Aug 08 18:39:00 coreos.example.com systemd[1]: Started Wait for Network to be Configured.
Aug 08 18:39:00 coreos.example.com systemd[1]: Reached target Network is Online.
Aug 08 18:39:00 coreos.example.com systemd[1]: home.mount: Directory /home to mount over is not empty, mounting anyway.
Aug 08 18:39:00 coreos.example.com systemd[1]: Mounting /home...
Aug 08 18:39:17 coreos.example.com mount[814]: mount.nfs: Failed to resolve server nfs.example.com: Name or service not known
Aug 08 18:39:17 coreos.example.com systemd[1]: home.mount: Mount process exited, code=exited, status=32/n/a
Aug 08 18:39:17 coreos.example.com systemd[1]: home.mount: Failed with result 'exit-code'.
Aug 08 18:39:17 coreos.example.com systemd[1]: Failed to mount /home.
Aug 08 18:39:17 coreos.example.com systemd[1]: Dependency failed for Remote File Systems.
Aug 08 18:39:17 coreos.example.com systemd[1]: remote-fs.target: Job remote-fs.target/start failed with result 'dependency'.
Aug 08 18:39:17 coreos.example.com systemd[1]: Starting Permit User Sessions...
Aug 08 18:39:17 coreos.example.com systemd[1]: Started Permit User Sessions.
Aug 08 18:39:17 coreos.example.com systemd[1]: Started Serial Getty on ttyS0.
Aug 08 18:39:17 coreos.example.com systemd[1]: Started Getty on tty1.
Aug 08 18:39:17 coreos.example.com systemd[1]: Reached target Login Prompts.
Aug 08 18:39:17 coreos.example.com systemd[1]: Reached target Multi-User System.
Aug 08 18:39:17 coreos.example.com systemd[1]: Startup finished in 4.259s (kernel) + 4.999s (initrd) + 40.602s (userspace) = 49.8>
Aug 08 18:39:18 coreos.example.com systemd-timesyncd[735]: Synchronized to time server for the first time 198.58.105.63:123 (2.co>
Aug 08 18:39:33 coreos.example.com update_engine[763]: I0808 18:39:33.170766 763 update_attempter.cc:493] Updating boot flags...
Aug 08 18:39:42 coreos.example.com systemd[1]: Created slice system-sshd.slice.
Aug 08 18:39:42 coreos.example.com systemd[1]: Started OpenSSH per-connection server daemon (10.64.19.203:58386).
Aug 08 18:39:42 coreos.example.com sshd[836]: Accepted publickey for core from 10.64.19.203 port 58386 ssh2: RSA SHA256:ChUbqqydY>
Aug 08 18:39:42 coreos.example.com sshd[836]: pam_unix(sshd:session): session opened for user core by (uid=0)
Aug 08 18:39:42 coreos.example.com systemd[1]: Created slice User Slice of UID 500.
Aug 08 18:39:42 coreos.example.com systemd[1]: Starting User Runtime Directory /run/user/500...
Aug 08 18:39:42 coreos.example.com systemd-logind[764]: New session 1 of user core.
Aug 08 18:39:42 coreos.example.com systemd[1]: home.mount: Directory /home to mount over is not empty, mounting anyway.
Aug 08 18:39:42 coreos.example.com systemd[1]: Mounting /home...
Aug 08 18:39:42 coreos.example.com systemd[1]: Started User Runtime Directory /run/user/500.
Aug 08 18:39:42 coreos.example.com systemd[1]: Starting User Manager for UID 500...
Aug 08 18:39:42 coreos.example.com systemd[842]: pam_unix(systemd-user:session): session opened for user core by (uid=0)
Aug 08 18:39:42 coreos.example.com systemd[842]: Reached target Sockets.
Aug 08 18:39:42 coreos.example.com systemd[842]: Reached target Paths.
Aug 08 18:39:42 coreos.example.com systemd[842]: Reached target Timers.
Aug 08 18:39:42 coreos.example.com systemd[842]: Reached target Basic System.
Aug 08 18:39:42 coreos.example.com systemd[842]: Reached target Default.
Aug 08 18:39:42 coreos.example.com systemd[842]: Startup finished in 53ms.
Aug 08 18:39:42 coreos.example.com systemd[1]: Started User Manager for UID 500.
Aug 08 18:39:42 coreos.example.com mount[839]: mount.nfs: Failed to resolve server nfs.example.com: Name or service not known
Aug 08 18:39:42 coreos.example.com systemd[1]: home.mount: Mount process exited, code=exited, status=32/n/a
Aug 08 18:39:42 coreos.example.com systemd[1]: home.mount: Failed with result 'exit-code'.
Aug 08 18:39:42 coreos.example.com systemd[1]: Failed to mount /home.
Aug 08 18:39:42 coreos.example.com systemd[1]: Dependency failed for Session 1 of user core.

Tcpdump on DNS server

[root@dns ~]# tcpdump -i eno1 host coreos.example.com and port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eno1, link-type EN10MB (Ethernet), capture size 262144 bytes

11:39:17.879855 IP coreos.example.com.43126 > dns.example.com.domain: 50726+ A? nfs.example.com.example.com. (45)
11:39:17.879884 IP coreos.example.com.46795 > dns.example.com.domain: 50834+ A? 1.coreos.pool.ntp.org.example.com. (51)
11:39:17.879915 IP coreos.example.com.46795 > dns.example.com.domain: 32414+ AAAA? 1.coreos.pool.ntp.org.example.com. (51)
11:39:17.880032 IP dns.example.com.domain > coreos.example.com.46795: 50834 NXDomain 0/0/0 (51)
11:39:17.880050 IP dns.example.com.domain > coreos.example.com.46795: 32414 NXDomain 0/0/0 (51)
11:39:17.881057 IP coreos.example.com.35273 > dns.example.com.domain: 37928+ A? 2.coreos.pool.ntp.org. (39)
11:39:17.881088 IP coreos.example.com.35273 > dns.example.com.domain: 51246+ AAAA? 2.coreos.pool.ntp.org. (39)
11:39:17.957679 IP dns.example.com.domain > coreos.example.com.43126: 50726 NXDomain 0/1/0 (102)
11:39:18.052238 IP dns.example.com.domain > coreos.example.com.35273: 37928 4/9/14 A 198.58.105.63, A 45.76.244.193, A 206.55.191.142, A 184.105.182.15 (477)
11:39:18.053921 IP dns.example.com.domain > coreos.example.com.35273: 51246 4/9/13 AAAA 2607:7c80:55:1005::254, AAAA 2600:3c00::f03c:91ff:fe91:b509, AAAA 2600:3c03::f03c:91ff:fe3e:c3bb, AAAA 2a0d:5600:33🅱️:1 (509)

^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel
[root@dns ~]#

@ajeddeloh

This comment has been minimized.

Copy link

commented Aug 8, 2019

Can you post the contents of the mount unit (either a handwritten one or the one that gets generated)?

@aminasyan

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

core@coreos ~ $ cat /etc/systemd/system/home.mount
[Unit]
Description=NFS Mount of /share
Requires=network-online.target
After=network-online.target

[Mount]
What=nfs.example.com:/data
Where=/home
Type=nfs
Options=nofail,x-systemd.device-timeout=60s,intr,hard,nfsvers=3,timeo=600,proto=tcp,retrans=2

[Install]
WantedBy=remote-fs.target

@aminasyan

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

This seems to be a transient process. The DNS resolution falls back to normal when systemd-timesync retries, after that all DNS queries are fine (don’t have the double suffix) but unfortunately NFS has already failed, as you can see from the tcpdump output.

@aminasyan

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

I have tried with a VM and that works flawless. It could be because the real hardware is slow at initializing network than a VM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.