New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time synchronisation appears to fail silently on Alpha 1590.1.0 #2277

Closed
efinlay24 opened this Issue Dec 5, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@efinlay24

efinlay24 commented Dec 5, 2017

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1590.1.0
VERSION_ID=1590.1.0
BUILD_ID=2017-11-30-0738
PRETTY_NAME="Container Linux by CoreOS 1590.1.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

Cluster of 5 AWS EC2 m4.xlarge instances.

Expected Behavior

On Stable release 1465.8.0 and Beta release 1576.3.0, time synchronisation works as expected.

$ timedatectl
      Local time: Tue 2017-12-05 14:56:18 UTC
  Universal time: Tue 2017-12-05 14:56:18 UTC
        RTC time: Tue 2017-12-05 14:56:18
       Time zone: UTC (UTC, +0000)
 Network time on: no
NTP synchronized: yes
 RTC in local TZ: no
$ systemctl status systemd-timesyncd
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/usr/lib/systemd/system/systemd-timesyncd.service; disabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-12-04 14:20:02 UTC; 24h ago
     Docs: man:systemd-timesyncd.service(8)
 Main PID: 688 (systemd-timesyn)
   Status: "Synchronized to time server 66.85.74.226:123 (2.coreos.pool.ntp.org)."
    Tasks: 2 (limit: 32768)
   Memory: 2.0M
      CPU: 252ms
   CGroup: /system.slice/systemd-timesyncd.service
           └─688 /usr/lib/systemd/systemd-timesyncd

Dec 05 14:23:24 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Network configuration changed, trying to establish connection.
Dec 05 14:23:24 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Synchronized to time server 66.85.74.226:123 (2.coreos.pool.ntp.org).
Dec 05 14:23:25 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Network configuration changed, trying to establish connection.
Dec 05 14:23:25 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Synchronized to time server 66.85.74.226:123 (2.coreos.pool.ntp.org).
Dec 05 14:23:29 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Network configuration changed, trying to establish connection.
Dec 05 14:23:29 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Synchronized to time server 66.85.74.226:123 (2.coreos.pool.ntp.org).
Dec 05 14:23:29 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Network configuration changed, trying to establish connection.
Dec 05 14:23:29 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Synchronized to time server 66.85.74.226:123 (2.coreos.pool.ntp.org).
Dec 05 14:49:48 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Network configuration changed, trying to establish connection.
Dec 05 14:49:48 coreos-stable.eu-west-1.compute.internal systemd-timesyncd[688]: Synchronized to time server 66.85.74.226:123 (2.coreos.pool.ntp.org).
$ stat /var/lib/systemd/clock
  File: '/var/lib/systemd/clock'
  Size: 0         	Blocks: 8          IO Block: 4096   regular empty file
Device: ca09h/51721d	Inode: 166         Links: 1
Access: (0644/-rw-r--r--)  Uid: (  243/systemd-timesync)   Gid: (  243/systemd-timesync)
Context: system_u:object_r:init_var_lib_t:s0
Access: 2017-12-05 14:49:48.000000000 +0000
Modify: 2017-12-05 14:49:48.000000000 +0000
Change: 2017-12-05 14:49:48.000000000 +0000
 Birth: -

The timestamps of the last successful time synchronisation on /var/lib/systemd/clock (as per here) are reliably updated, and generally within the last 15 mins or so.

Actual Behavior

On Alpha release 1590.1.0, time synchronisation is reported as successful:

$ timedatectl
                      Local time: Tue 2017-12-05 15:06:52 UTC
                  Universal time: Tue 2017-12-05 15:06:52 UTC
                        RTC time: Tue 2017-12-05 15:06:52
                       Time zone: UTC (UTC, +0000)
       System clock synchronized: yes
systemd-timesyncd.service active: yes
                 RTC in local TZ: no
$ systemctl status systemd-timesyncd
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/usr/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-12-05 11:32:44 UTC; 3h 34min ago
     Docs: man:systemd-timesyncd.service(8)
 Main PID: 13803 (systemd-timesyn)
   Status: "Synchronized to time server 52.17.30.119:123 (0.coreos.pool.ntp.org)."
    Tasks: 2 (limit: 32768)
   CGroup: /system.slice/systemd-timesyncd.service
           └─13803 /usr/lib/systemd/systemd-timesyncd

Dec 05 14:21:22 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Synchronized to time server 54.194.18.100:123 (0.coreos.pool.ntp.org).
Dec 05 14:21:23 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Network configuration changed, trying to establish connection.
Dec 05 14:21:23 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Synchronized to time server 54.194.18.100:123 (0.coreos.pool.ntp.org).
Dec 05 14:21:27 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Network configuration changed, trying to establish connection.
Dec 05 14:21:27 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Synchronized to time server 54.194.18.100:123 (0.coreos.pool.ntp.org).
Dec 05 14:21:27 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Network configuration changed, trying to establish connection.
Dec 05 14:21:37 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Timed out waiting for reply from 54.194.18.100:123 (0.coreos.pool.ntp.org).
Dec 05 14:21:37 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Synchronized to time server 52.17.30.119:123 (0.coreos.pool.ntp.org).
Dec 05 14:48:42 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Network configuration changed, trying to establish connection.
Dec 05 14:48:42 coreos-alpha.eu-west-1.compute.internal systemd-timesyncd[13803]: Synchronized to time server 52.17.30.119:123 (0.coreos.pool.ntp.org).

However, the timestamps on /var/lib/systemd/clock no longer update:

$ stat /var/lib/systemd/clock
  File: '/var/lib/systemd/clock'
  Size: 0         	Blocks: 8          IO Block: 4096   regular empty file
Device: ca09h/51721d	Inode: 163         Links: 1
Access: (0644/-rw-r--r--)  Uid: (  243/systemd-timesync)   Gid: (  243/systemd-timesync)
Context: system_u:object_r:init_var_lib_t:s0
Access: 2017-12-04 14:54:02.000000000 +0000
Modify: 2017-11-20 09:54:26.000000000 +0000
Change: 2017-11-20 09:54:26.000000000 +0000
 Birth: -

We also see strange time-related errors in our apps on the alpha cluster, where publish processes can complete before they've started with a negative duration, depending on what node of the cluster the apps are running on, and depending on that node's time drift.

If it's useful, I'm happy to share the app logs as well.

Please note that this seems to be case regardless of whether timedatectl set-ntp is true or false - we've tried it enabled and disabled, and it doesn't appear to make a difference.

Upgrading from Stable > Alpha seems to reproduce this issue. Beta doesn't appear to have this problem right now.

Reproduction Steps

  1. Provision a stable CoreOS instance.
  2. Check the output of timedatectl, systemctl status systemd-timesyncd, stat /var/lib/systemd/clock.
  3. Confirm that the timestamps on /var/lib/systemd/clock are updating.
  4. Upgrade to alpha CoreOS.
  5. Repeat step 2.
  6. Confirm that the timestamps on /var/lib/systemd/clock no longer update.

Other Information

@efinlay24

This comment has been minimized.

Show comment
Hide comment
@efinlay24

efinlay24 Dec 8, 2017

Just to note that Beta is now on 1590.2.0, so I'd expect the same issue to occur now in Beta. I've not had time to test that and confirm it, though.

efinlay24 commented Dec 8, 2017

Just to note that Beta is now on 1590.2.0, so I'd expect the same issue to occur now in Beta. I've not had time to test that and confirm it, though.

@dm0-

This comment has been minimized.

Show comment
Hide comment
@dm0-

dm0- Dec 8, 2017

Member

The 1590 releases are using systemd v235, which uses /var/lib/systemd/timesync/clock since systemd/systemd@53d133e . The upstream documentation page you linked shows this path as well. It seems to be functioning as intended for me on alpha. Is your system time not actually syncing?

Member

dm0- commented Dec 8, 2017

The 1590 releases are using systemd v235, which uses /var/lib/systemd/timesync/clock since systemd/systemd@53d133e . The upstream documentation page you linked shows this path as well. It seems to be functioning as intended for me on alpha. Is your system time not actually syncing?

@efinlay24

This comment has been minimized.

Show comment
Hide comment
@efinlay24

efinlay24 Dec 11, 2017

Thanks @dm0- - that explains why the timestamp for last successful sync no longer updates on /var/lib/systemd/clock. Apologies, when I was investigating the issue, the documentation page was still on v234, and referenced the old file.

I've checked /var/lib/systemd/timesync/clock on our test cluster, and the timestamps do appear to be relatively recent:

$ stat /var/lib/systemd/timesync/clock
  File: '/var/lib/systemd/timesync/clock'
  Size: 0         	Blocks: 8          IO Block: 4096   regular empty file
Device: ca09h/51721d	Inode: 1568441     Links: 1
Access: (0644/-rw-r--r--)  Uid: (  243/systemd-timesync)   Gid: (  243/systemd-timesync)
Context: system_u:object_r:init_var_lib_t:s0
Access: 2017-12-11 10:21:05.000000000 +0000
Modify: 2017-12-11 10:21:05.000000000 +0000
Change: 2017-12-11 10:21:05.000000000 +0000
 Birth: -

I'll speak to our devs who raised the issue, and ask them to drop some more info here with exactly what's going wrong with time synchronisation.

efinlay24 commented Dec 11, 2017

Thanks @dm0- - that explains why the timestamp for last successful sync no longer updates on /var/lib/systemd/clock. Apologies, when I was investigating the issue, the documentation page was still on v234, and referenced the old file.

I've checked /var/lib/systemd/timesync/clock on our test cluster, and the timestamps do appear to be relatively recent:

$ stat /var/lib/systemd/timesync/clock
  File: '/var/lib/systemd/timesync/clock'
  Size: 0         	Blocks: 8          IO Block: 4096   regular empty file
Device: ca09h/51721d	Inode: 1568441     Links: 1
Access: (0644/-rw-r--r--)  Uid: (  243/systemd-timesync)   Gid: (  243/systemd-timesync)
Context: system_u:object_r:init_var_lib_t:s0
Access: 2017-12-11 10:21:05.000000000 +0000
Modify: 2017-12-11 10:21:05.000000000 +0000
Change: 2017-12-11 10:21:05.000000000 +0000
 Birth: -

I'll speak to our devs who raised the issue, and ask them to drop some more info here with exactly what's going wrong with time synchronisation.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert May 12, 2018

Member

Closing due to lack of response. If you're seeing time synchronization issues on current Container Linux releases, please open a new bug.

Member

bgilbert commented May 12, 2018

Closing due to lack of response. If you're seeing time synchronization issues on current Container Linux releases, please open a new bug.

@bgilbert bgilbert closed this May 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment