Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple fixes related to system time #5244

Merged
merged 2 commits into from
May 2, 2024
Merged

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented May 1, 2024

Additional Context

This PR contains two commits related to system time.

The first fixes container to be more accurate with cloud-init analyze commands.
The second hardens cloud-init against clock changes, which are possible (and likely)
during early boot.

Fixes #2423
Fixes #3149

Test Steps

See commit messages for test steps.

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Since lxc/lxcfs#292 has been fixed, we can use
util.uptime().

On a freshly booted container the following script:
```
import os, time
from cloudinit.util import uptime
print(f"{uptime(),os.stat('/proc/1/cmdline').st_atime, time.monotonic()}")
```

Shows the following output:

('20.09', 1714515317.703158, 28017.975603713)

Since 20 seconds is much closer to the expected uptime than the other methods,
use `util.uptime()`.
Copy link
Collaborator

@igalic igalic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self

@@ -243,19 +242,6 @@ def gather_timestamps_using_systemd():
# lxc based containers do not set their monotonic zero point to be when
# the container starts, instead keep using host boot as zero point
if util.is_container():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know many people using cloud-init in jails, but i need to test if this code works in FreeBSD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igalic +1 good idea, upstream cloud-init gets lots of usage and testing in lxc, but it would be good to know how this code performs in jails. Just running the following in a jail would probably tell us all that we need to know:

#!/usr/bin/python3
import os, time
from cloudinit.util import uptime
print(f"{uptime(),os.stat('/proc/1/cmdline').st_atime, time.monotonic()}")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, there is no '/proc/1in a FreeBSD jail, and if there was, there'd be no/proc/1/cmdline`

FreeBSD doesn't provide PID name spaces (yet, like Solaris zones), and the /proc is very stripped down for security reasons

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, there is no '/proc/1in a FreeBSD jail, and if there was, there'd be no/proc/1/cmdline`

FreeBSD doesn't provide PID name spaces (yet, like Solaris zones), and the /proc is very stripped down for security reasons

I see - the others would still be useful to know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uptime() takes a different path on BSD, so no /proc/1 needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for completeness:

>>> print(f"{uptime(), time.monotonic()}")
('4598464.319990158', 4598464.320018191)
>>> os.system("uptime")
 9:20PM  up 53 days,  5:24, 0 users, load averages: 0.16, 0.23, 0.20

by default, FreeBSD jails don't have /proc mounted; uptime is a syscall which returns the kernel's uptime.

root@ports:~ # ps awuxdj
USER   PID %CPU %MEM   VSZ  RSS TT  STAT STARTED    TIME COMMAND           PPID  PGID   SID JOBC
root 21923  0.0  0.0 12768 2016  -  SsJ  Mon16   0:00.52 /usr/sbin/syslog     1 21923 21923    0
root 31780  0.0  0.1 14000 3324  5  SJ   21:10   0:00.06 tcsh             31429 31780 31429    1
root 12419  0.0  0.1 13600 2580  5  R+J  21:27   0:00.00 ps awuxdj        31780 12419 31429    1
root 50023  0.0  0.1 16560 2108  3  I+J  Mon16   0:00.03 tcsh             49556 50023 49556    1
root@ports:~ # 

the oldest process' PPID here is the host system's PID 1

Anyway, I'll look into this as soon as people start seriously using cloud-init in jails, or if I find a good way of testing it under jails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igalic thanks! it looks like uptime() should work for jails too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uptime works, but it's the host's uptime, so often can be a lie. then again, I don't know how important this is to be accurate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Sounds good. This isn't a very high priority without a bunch of users. This change is only relevant for analyze subcommands - users that are manually digging into performance.

Harden cloud-init against system clock changes by using `time.monotonic()`
and `cloudinit.util.uptime()` instead of `time.time()`.

Use `util.uptime()` when time should be increment across cloud-init stages.
Use `time.monotonic()` when only the time delta in a single process matters.

Observe the affects of changing system time:

```
>>> time.time()
1714528647.3474798
>>> time.monotonic()
47.738647985
>>> from cloudinit.util import uptime
>>> uptime()
'70.09'
>>> # set time back over 1 year in another terminal
>>> time.time()
1672531205.9688644
>>> time.monotonic()
106.06945439
>>> uptime()
'109.39'
```

Fixes canonicalGH-2423
Fixes canonicalGH-3149
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM. One question inline.

@@ -764,7 +763,8 @@ def status_wrapper(name, args):

v1 = status["v1"]
v1["stage"] = mode
v1[mode]["start"] = time.time()
uptime = util.uptime()
v1[mode]["start"] = float(uptime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed that this field (along with finished) was used to compare against actual timestamps to know when things happened rather than just tracking elapsed time. Maybe that's a bad assumption, though.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed that this field (along with finished) was used to compare against actual timestamps to know when things happened rather than just tracking elapsed time. Maybe that's a bad assumption, though.

The spot that I see it used is in status.py which uses the relative time between the stages, which is why time.monotonic() won't work.

I didn't find timestamps compared anywhere, and if the code was doing that with time.time(), such an implementation would also be susceptible to time change. With uptime in this field it can actually be used to manually correlate with events reported in dmesg / systemd logs / etc, so I think the new code is more correct than it was before

@TheRealFalcon TheRealFalcon self-assigned this May 2, 2024
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@holmanb holmanb merged commit 9b1b294 into canonical:main May 2, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

timeout can be wrong if system time change wait_for_url doesn't account for system clock being changed
3 participants