-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple fixes related to system time #5244
Conversation
Since lxc/lxcfs#292 has been fixed, we can use util.uptime(). On a freshly booted container the following script: ``` import os, time from cloudinit.util import uptime print(f"{uptime(),os.stat('/proc/1/cmdline').st_atime, time.monotonic()}") ``` Shows the following output: ('20.09', 1714515317.703158, 28017.975603713) Since 20 seconds is much closer to the expected uptime than the other methods, use `util.uptime()`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self
@@ -243,19 +242,6 @@ def gather_timestamps_using_systemd(): | |||
# lxc based containers do not set their monotonic zero point to be when | |||
# the container starts, instead keep using host boot as zero point | |||
if util.is_container(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know many people using cloud-init in jails, but i need to test if this code works in FreeBSD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igalic +1 good idea, upstream cloud-init gets lots of usage and testing in lxc, but it would be good to know how this code performs in jails. Just running the following in a jail would probably tell us all that we need to know:
#!/usr/bin/python3
import os, time
from cloudinit.util import uptime
print(f"{uptime(),os.stat('/proc/1/cmdline').st_atime, time.monotonic()}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, there is no '/proc/1in a FreeBSD jail, and if there was, there'd be no
/proc/1/cmdline`
FreeBSD doesn't provide PID name spaces (yet, like Solaris zones), and the /proc
is very stripped down for security reasons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, there is no '/proc/1
in a FreeBSD jail, and if there was, there'd be no
/proc/1/cmdline`FreeBSD doesn't provide PID name spaces (yet, like Solaris zones), and the
/proc
is very stripped down for security reasons
I see - the others would still be useful to know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uptime()
takes a different path on BSD, so no /proc/1
needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for completeness:
>>> print(f"{uptime(), time.monotonic()}")
('4598464.319990158', 4598464.320018191)
>>> os.system("uptime")
9:20PM up 53 days, 5:24, 0 users, load averages: 0.16, 0.23, 0.20
by default, FreeBSD jails don't have /proc
mounted; uptime is a syscall which returns the kernel's uptime.
root@ports:~ # ps awuxdj
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND PPID PGID SID JOBC
root 21923 0.0 0.0 12768 2016 - SsJ Mon16 0:00.52 /usr/sbin/syslog 1 21923 21923 0
root 31780 0.0 0.1 14000 3324 5 SJ 21:10 0:00.06 tcsh 31429 31780 31429 1
root 12419 0.0 0.1 13600 2580 5 R+J 21:27 0:00.00 ps awuxdj 31780 12419 31429 1
root 50023 0.0 0.1 16560 2108 3 I+J Mon16 0:00.03 tcsh 49556 50023 49556 1
root@ports:~ #
the oldest process' PPID here is the host system's PID 1
Anyway, I'll look into this as soon as people start seriously using cloud-init in jails, or if I find a good way of testing it under jails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@igalic thanks! it looks like uptime()
should work for jails too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uptime works, but it's the host's uptime, so often can be a lie. then again, I don't know how important this is to be accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Sounds good. This isn't a very high priority without a bunch of users. This change is only relevant for analyze
subcommands - users that are manually digging into performance.
Harden cloud-init against system clock changes by using `time.monotonic()` and `cloudinit.util.uptime()` instead of `time.time()`. Use `util.uptime()` when time should be increment across cloud-init stages. Use `time.monotonic()` when only the time delta in a single process matters. Observe the affects of changing system time: ``` >>> time.time() 1714528647.3474798 >>> time.monotonic() 47.738647985 >>> from cloudinit.util import uptime >>> uptime() '70.09' >>> # set time back over 1 year in another terminal >>> time.time() 1672531205.9688644 >>> time.monotonic() 106.06945439 >>> uptime() '109.39' ``` Fixes canonicalGH-2423 Fixes canonicalGH-3149
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. One question inline.
@@ -764,7 +763,8 @@ def status_wrapper(name, args): | |||
|
|||
v1 = status["v1"] | |||
v1["stage"] = mode | |||
v1[mode]["start"] = time.time() | |||
uptime = util.uptime() | |||
v1[mode]["start"] = float(uptime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had assumed that this field (along with finished
) was used to compare against actual timestamps to know when things happened rather than just tracking elapsed time. Maybe that's a bad assumption, though.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had assumed that this field (along with finished) was used to compare against actual timestamps to know when things happened rather than just tracking elapsed time. Maybe that's a bad assumption, though.
The spot that I see it used is in status.py
which uses the relative time between the stages, which is why time.monotonic()
won't work.
I didn't find timestamps compared anywhere, and if the code was doing that with time.time()
, such an implementation would also be susceptible to time change. With uptime in this field it can actually be used to manually correlate with events reported in dmesg / systemd logs / etc, so I think the new code is more correct than it was before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Additional Context
This PR contains two commits related to system time.
The first fixes container to be more accurate with
cloud-init analyze
commands.The second hardens cloud-init against clock changes, which are possible (and likely)
during early boot.
Fixes #2423
Fixes #3149
Test Steps
See commit messages for test steps.
Merge type