Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubuntu/devel #5258

Merged
merged 95 commits into from
May 3, 2024
Merged

Ubuntu/devel #5258

merged 95 commits into from
May 3, 2024

Conversation

blackboxsw
Copy link
Collaborator

@blackboxsw blackboxsw commented May 3, 2024

Perform new_upsteam snapshot of main into oracular for release.
consolidate debian/changelog entries

process to create branch

# Sync released d/changelog from noble 24.1.x and add it before latest local d/changelog entry
git show upstream/ubuntu/noble-24.1.x:debian/changelog 
# Manually redact any duplicate LP bugs already fixed by in released 24.1.x changelog entries to avoid duplicate Fixes comments in latest changelog entry for oracular
new_upstream_snapshot.py
# manually consolidate duplicated `Upstream snapshot based on` entries it top-most changelog

test procedure followed:

build-package
sbuild --dist=oracular --arch=amd64 --arch-all ../out/cloud-init_24.2~2g370e680c-0ubuntu1.dsc

TheRealFalcon and others added 30 commits March 27, 2024 08:27
Bump the version in cloudinit/version.py to 24.1.3 and
update ChangeLog.
Address assignment and link management is manual for isc-dhcp-client
whereas dhcpcd brings up its own interface and assigns the IP address.

Interface rename code assumes that the link will be down for rename.
Make sure to set dhcpcd's interface to the same state.
Rebooting an instance which has finished VMware guest
customization with DataSourceVMware will load
DataSourceNone due to metadata is NOT available.

This is mostly a re-post of PR#229, few differences are:
1. Let ds decide if fallback is allowed, not always fall back
   to previous cached LOCAL ds.
2. No comparing instance-id of cached ds with previous instance-id
   due to I think they are always identical.

Fixes canonicalGH-3402
When cloud-init finds any ipv6 information in the instance metadata, it
automatically enables dhcp6 for the network interface. However, this
brings up the instance with a broken IPv6 configuration because SLAAC
should be used for almost all situations on EC2.

Red Hat BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2092459
Fedora Pagure: https://pagure.io/cloud-sig/issue/382
Upstream: https://bugs.launchpad.net/cloud-init/+bug/1976526

Fixes canonicalGH-3980

Signed-off-by: Major Hayden <major@redhat.com>
On most distros, including Ubuntu, the default timeout for dhclient is 300s.
There is no cloud-init controlled duration for the dhclient process as
it doesn't fork until after it receives an IP address and there is no timeout
value passed to subp().

I have seen some distros configure dhclient with a timeout of 60s, but
is far less common.

Given that a cloud VM is not very useful with DHCP, err on the generous
side and allow up to 300 seconds for dhcpcd to get an address.

Note that there is still an issue with dhcpcd retries which will be
addressed later in a separate PR.

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
Update various hard-coded filepaths. Also make sure we
bootstrap our Paths() config correctly so that we read
from the configured rundir.

Co-authored-by: Mina Galić <freebsd@igalic.co>
Sponsored by: The FreeBSD Foundation

Fixes canonicalGH-4766
…se (canonical#5128)

Seeing a fairly large number of lease parsing failures on Azure similar
to:
```
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 851, in _get_data
    crawled_data = util.log_time(
                   ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2828, in log_time
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 660, in crawl_metadata
    self._wait_for_pps_savable_reuse()
  File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 1236, in _wait_for_pps_savable_reuse
    self._wait_for_hot_attached_primary_nic(nl_sock)
  File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 1142, in _wait_for_hot_attached_primary_nic
    primary_nic_found = self._setup_ephemeral_networking(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 45, in impl
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 440, in _setup_ephemeral_networking
    lease = self._ephemeral_dhcp_ctx.obtain_lease()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/net/ephemeral.py", line 293, in obtain_lease
    self.lease = maybe_perform_dhcp_discovery(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 103, in maybe_perform_dhcp_discovery
    return distro.dhcp_client.dhcp_discovery(interface, dhcp_log_func, distro)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 656, in dhcp_discovery
    lease = self.get_newest_lease(interface)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 829, in get_newest_lease
    return self.parse_dhcpcd_lease(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/cloudinit/net/dhcp.py", line 787, in parse_dhcpcd_lease
    lease = dict(
            ^^^^^
ValueError: dictionary update sequence element #0 has length 1; 2 is required
```

Catch this error in parse_dhcpcd_lease() and raise
InvalidDHCPLeaseFileError after logging an error.

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
After [0, 1], dhcp6 is going to be always false after upgrading
cloud-init. Correct this in the integration test.

Refs:
[0] canonical#3980
[1] https://bugs.launchpad.net/cloud-init/+bug/1976526
Don't log sensitive data.

Since /var/log/cloud-init.log is a priviledged file, this does not expose a
secure system (no CVE). However, we don't want to log this information so that
users can file reports without having to manually redact logs.

Standardize log messages so that redacted and non-redacted logs match.
…5145)

This reverts commit f0fb841.

It appears that this bug was fixed already via another patch sometime
between the time I found the issue and submitted the PR canonical#5104. This
patch isn't needed any longer and I want to avoid causing additional
problems.

Signed-off-by: Major Hayden <major@redhat.com>
…cal#5144)

In scenarios where a lot of retries are expected, Ubuntu 24.04 fails
regularly with "Too many open files".  The `ulimit -n` shows the same
number of allowed open files in Ubuntu 20.04 (1024), but the connections
don't close on 24.04. As retries gets close to 1024 in readurl(), the
open file limit is hit and exceptions sprout up in a number of places.

It appears that the reuse of Sesssion's context manager triggers a
connection leak on python3-requests used in 24.04 when saving references
to the requests (in excps[]).

- drop `with session as sess` context manager. Session should be able
  to handle all retry attempts without a context manager

- raise exceptions immediately when required rather than saving them to
  excps[] to raise outside of the exception handler

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
…tion (canonical#5146)

Cloud-init does not configure rfc3442-classless-static-routes if dhclient isn't
patched to support them or it is not configured with:

```
option rfc3442-classless-static-routes code 121 = array of unsigned integer 8;
```

Example lease with option configured (typical):

lease {
  interface "eth0";
  <...cut...>
  option rfc3442-classless-static-routes 0,10,0,0,1,32,168,63,129,16,10,0,0,1,32,169,254,169,254,10,0,0,1;
  <...cut...>
}

Example lease without option, where it is presented as "unknown-121":

lease {
  interface "eth0";
  <...cut...>
  option unknown-121 0:a:0:0:1:20:a8:3f:81:10:a:0:0:1:20:a9:fe:a9:fe:a:0:0:1;
  <...cut...>
}

The primary difference is that dhclient outputs the bytes in a
hex-encoded format and with `:` delimiter. Extend existing
parsing to support this format.

With a couple added INFO logs, here is a sample DHCP on Azure with
static routes being parsed from unknown-121 option with this patch:

```
2024-04-04 16:12:01,677 - ephemeral.py[DEBUG]: Received dhcp lease on eth0 for 10.0.0.11/255.255.255.0
2024-04-04 16:12:01,677 - dhcp.py[INFO]: Parsing: '0:a:0:0:1:20:a8:3f:81:10:a:0:0:1:20:a9:fe:a9:fe:a:0:0:1'
2024-04-04 16:12:01,677 - dhcp.py[INFO]: Tokens: ['0', '10', '0', '0', '1', '32', '168', '63', '129', '16', '10', '0', '0', '1', '32', '169', '254', '169', '254', '10', '0', '0', '1']
2024-04-04 16:12:01,677 - ephemeral.py[DEBUG]: Attempting setup of ephemeral network on eth0 with 10.0.0.11/24 brd 10.0.0.255
2024-04-04 16:12:01,677 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '10.0.0.11/24', 'broadcast', '10.0.0.255', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2024-04-04 16:12:01,679 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'eth0', 'up'] with allowed return codes [0] (shell=False, capture=True)
2024-04-04 16:12:01,681 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'append', '0.0.0.0/0', 'via', '10.0.0.1', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2024-04-04 16:12:01,683 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'append', '168.63.129.16/32', 'via', '10.0.0.1', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2024-04-04 16:12:01,684 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'append', '169.254.169.254/32', 'via', '10.0.0.1', 'dev', 'eth0'] with allowed return codes [0] (shell=False, capture=True)
2024-04-04 16:12:01,686 - handlers.py[DEBUG]: start: azure-ds/_check_if_primary: _check_if_primary
2024-04-04 16:12:01,686 - handlers.py[DEBUG]: finish: azure-ds/_check_if_primary: SUCCESS: _check_if_primary
2024-04-04 16:12:01,687 - azure.py[DEBUG]: Obtained DHCP lease on interface 'eth0' (primary=True driver='hv_netvsc' router='10.0.0.1' routes=[('0.0.0.0/0', '10.0.0.1'), ('168.63.129.16/32', '10.0.0.1'), ('169.254.169.254/32', '10.0.0.1')] lease={'inter
face': 'eth0', 'fixed-address': '10.0.0.11', 'server-name': 'BL24A1071918060SOC', 'subnet-mask': '255.255.255.0', 'dhcp-lease-time': '4294967295', 'routers': '10.0.0.1', 'dhcp-message-type': '5', 'domain-name-servers': '168.63.129.16', 'dhcp-server-ide
ntifier': '168.63.129.16', 'dhcp-renewal-time': '4294967295', 'unknown-121': '0:a:0:0:1:20:a8:3f:81:10:a:0:0:1:20:a9:fe:a9:fe:a:0:0:1', 'dhcp-rebinding-time': '4294967295', 'unknown-245': 'a8:3f:81:10', 'domain-name': 'fyoqc4gghleevjxtq4h4pjbded.bx.int
ernal.cloudapp.net', 'renew': '0 2160/05/11 22:40:16', 'rebind': '0 2160/05/11 22:40:16', 'expire': '0 2160/05/11 22:40:16'} imds_routed=True wireserver_routed=True)
```

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
In the Alibaba Cloud scenario, we do not wish to define routing priority based on MAC addresses.
In a cloud environment where the kernel parameter net.ifnames=0 has been configured,
network interface card (NIC) names are determined by default according to their underlying Bus, Device, and Function (BDF) numbers,
incrementing from eth0 to ethN, with eth0 acting as the default primary NIC name.

In the previous logic, network-card has the highest priority, followed by device-number as the second priority.
When _fallback_nic_order is set to NicOrder.MAC, the mac address takes the third priority.
On the other hand, when _fallback_nic_order is set to NicOrder.NIC_NAME, the NIC name becomes the third priority.

In AWS environments, the default setting remains as _fallback_nic_order = NicOrder.MAC, maintaining the original behavior.
However, in Alibaba Cloud scenarios, we set _fallback_nic_order = NicOrder.NIC_NAME.
…l#5122)

freebsd obtained device partition name by the function find_freebsd_part,
the params of function is the device name mounted,  usually is first  field
of /etc/fstab ,  like /dev/gpt/rootfs,  but freebsd fstab also support gptid or
ufsid for a unique id to identify partitions, like /dev/gptid/xxx,
/dev/ufsid/xxx,  update function to  support
Various different activators, datasources, and networking code
implementations make use of manual iproute2 calls, which has led to
much code duplication in the codebase.

This is a small step towards replacing distro assumptions at call sites
with common interfaces, which will simplify future refactors for more
distro-agnostic code. These same abstractions will also enable simpler
testing.
PyYAML has built-in unicode support in Python3+.

The original code[1] was added as a helper to add
support for unicode to `yaml.safe_load()`. We
don't need this anymore, and can jettison it and
prefer `yaml.safe_load()`.

[1] a7a9de1
Alpine uses mdev for device mapper. BSDs don't have device mapper.
Many failures are being treated as warnings instead of errors due
to usage of logexc() to emit the failure.

Add log_level parameter to allow increasing the log level without
requiring an additional log.

Add tests but I'm unsure why its not logging the backtrace when
the failure occurs within the test method.

Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
ashuntu and others added 21 commits April 29, 2024 16:04
Currently, WSL only supports manual cloud-init configurations provided
in the host Windows filesystem. This adds support for Landscape/Ubuntu
Pro for WSL to provide cloud-init configurations and have them merged 
with or override manual user configurations. This adds support for 
organizations to better provision WSL instances using cloud-init.

Co-authored-by: Carlos Nihelton <carlosnsoliveira@gmail.com>
Co-authored-by: Chad Smith <chad.smith@canonical.com>
None of the unit tests should be reaching out to the network.
Since Alpine tests run under LXD, we can still easily run tests there
without network.

Also, include the tzdata package that was missed in 725f5fb and
remove unneeded network debug lines.
Attempting a `cloud-init clean --reboot` fails on alpine because this
command is hard-coded. This code already has an Init object available,
which has a Distro attribute. Stamp out the duplicate hard-coded
implementation and re-use distro reboot code to acquire cross-distro
compatibility.

Error:
Could not reboot this system using "['shutdown', '-r', 'now']": Unexpected error while running comma  nd.
Command: ['shutdown', '-r', 'now']
Exit code: -
Reason: [Errno 2] No such file or directory: b'shutdown'
Stdout: -
Stderr: -
…5239)

During validation process, the network schema is extracted without
the network key.  As such, the schema validation should work
either with or without the top level network key.  This change
updates the schema and adds a unit test to validate.
EC2 documents that the system-uuid may be reported in different endianness[1].

A user has reported a case where cloud-init is broken due to inability to
detect the system platform. Fix it.

Behavior change:

Cloud-init was previously making the assumption that uuid and serial would match
on ec2. This assumption was:

1) not documented as a valid way to identify ec2[1]

2) proven invalid on ec2 by the DMI_PRODUCT_SERIAL and DMI_PRODUCT_UUID reported in canonical#5105

3) used in the logic which warns about not running on the "real" ec2

Preserving this warning logic exactly as it was presents several challenges:

a) Risk of regression outside of our control: Since this logic relied upon undocumented
   behavior, AWS could change this at any point, which would break all cloud-init
   instances.

b) Risk of incorrect implementation: What format is the uuid and product serial actually
   in? We don't know. It's easy and safe to just swap the byteorder of the first segment
   of the uuid because this is documented, but matching the whole uuid is problematic
   because UUID formats may be presented as mixed encoding (partially little endian and
   partially big endian). To implement this behavior while fixing this bug we would have
   to make even more assumptions than before. I propose we stop assuming and if a cloud
   happens to implement the same as EC2 (minus the serial/product match), then we just
   don't emit that warning. It's simpler, it's safer, and I really don't think that it is
   a huge change. This is a "change in behavior", but the change is that the code more
   correctly identifies EC2 and would no longer emit a warning on valid ec2 instances, so
   I don't think that this would require omitting this change from SRU.

c) Implementing whatever assumptions we make in b) would require implementing a
   byteswapping algorithm in POSIX shell, which is possible but best to avoid this if
   possible.

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/identify_ec2_instances.html

Fixes canonicalGH-5105
Commit acc68de introduced a change which no longer builds a wheel,
however integration tests now fail when dependencies are not available.

Include the base requirements in test-requirements.txt.

Fixes canonicalGH-5210
When it passes locally, it would be good to know why ci fails.
…nonical#5251)

The PPA provided to CLOUD_INIT_CLOUD_INIT_SOURCE can contain a lower
version of cloud-init than what is currently released for a given
series. In those cases install an apt preferences file to pin
the cloud-init installed to the given PPA regardless of the published
version.
- Add missing templates for chrony and ntp configuration files.
- AlmaLinux OS is binary compatible with RHEL
  and CloudLinux OS based on AlmaLinux OS.

So, let's use distro-specific configurations from rhel.

Signed-off-by: Elkhan Mammadli <elkhan.mammadli@protonmail.com>
…nical#5226)

cc_mounts configures a `Requires=cloud-init.service` in the
configured mount unit via `x-systemd.requires=cloud-init.service`.
This creates a requirement dependency on cloud-init.service, even if
cloud-init is disabled.

Fix this by changing the mount unit dependency to
`x-systemd.after=cloud-init.service`.

Fixes canonicalGH-2815
Do not validation network config against cloud-init's network v2
schema on netplan systems because netplan schema supports keys not
present in cloud-init's v2 schema which can result in schema warnings
from cloud-init which are perfectly acceptable netplan config keys.

Given that cloud-init performs a clean passthrough of network
version 2 directly to netplan without trying to process the network
configuration, there is little value in providing such schema warnings
unless cloud-init's network v2 schema is aligned with the specific
netplan schema supported for each release.

On mantic and later, cloud-init will call netplan's python API
to validate schema with netplan itself, but prior to Ubuntu Mantic
no python API exists, so we cannot validate network v2 against netplan.

Update skip messaging and integration tests.
Since lxc/lxcfs#292 has been fixed, we can use
util.uptime().

On a freshly booted container the following script:
```
import os, time
from cloudinit.util import uptime
print(f"{uptime(),os.stat('/proc/1/cmdline').st_atime, time.monotonic()}")
```

Shows the following output:

('20.09', 1714515317.703158, 28017.975603713)

Since 20 seconds is much closer to the expected uptime than the other methods,
use `util.uptime()`.
Harden cloud-init against system clock changes by using `time.monotonic()`
and `cloudinit.util.uptime()` instead of `time.time()`.

Use `util.uptime()` when time should be increment across cloud-init stages.
Use `time.monotonic()` when only the time delta in a single process matters.

Observe the affects of changing system time:

```
>>> time.time()
1714528647.3474798
>>> time.monotonic()
47.738647985
>>> from cloudinit.util import uptime
>>> uptime()
'70.09'
>>> # set time back over 1 year in another terminal
>>> time.time()
1672531205.9688644
>>> time.monotonic()
106.06945439
>>> uptime()
'109.39'
```

Fixes canonicalGH-2423
Fixes canonicalGH-3149
The current descriptions do not follow systemd guidance and are misleading
since some services have multiple roles.

systemd.unit(5) on Description:

    A short human readable title of the unit. This may be used by systemd
    (and other UIs) as a user-visible label for the unit, so this string
    should identify the unit rather than describe it, despite the name.

Yet these service files attempt to describe the unit rather than identify it.

Before:
```
May 01 09:49:49.238629 tiny systemd[1]: Starting cloud-config.service - Apply the settings specified in cloud-config...
```

After:
```
May 01 09:49:49.238629 tiny systemd[1]: Starting cloud-config.service - Cloud-init: Config Stage...
```
Cloud-init closes stdin on startup, which breaks interactively running cloud-init under pdb.

Leave stdin open if it is connected to a tty, which fixes pdb use. Otherwise:

File "/usr/lib/python3.12/bdb.py", line 90, in trace_dispatch
  return self.dispatch_line(frame)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/bdb.py", line 115, in dispatch_line
    if self.quitting: raise BdbQuit
                      ^^^^^^^^^^^^^
bdb.BdbQuit
------------------------------------------------------------
The program exited via sys.exit(). Exit status: 1
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff and changelog looks good to me!

@blackboxsw blackboxsw merged commit ad0f348 into canonical:ubuntu/devel May 3, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet