Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine-id is not reset when instance-id changes #4066

Closed
ubuntu-server-builder opened this issue May 12, 2023 · 5 comments
Closed

machine-id is not reset when instance-id changes #4066

ubuntu-server-builder opened this issue May 12, 2023 · 5 comments
Labels
launchpad Migrated from Launchpad

Comments

@ubuntu-server-builder
Copy link
Collaborator

This bug was originally filed in Launchpad as LP: #2003121

Launchpad details
affected_projects = []
assignee = None
assignee_name = None
date_closed = 2023-01-19T21:14:50.502347+00:00
date_created = 2023-01-17T19:52:55.708270+00:00
date_fix_committed = None
date_fix_released = None
id = 2003121
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/2003121
milestone = None
owner = racb
owner_name = Robie Basak
private = False
status = wont_fix
submitter = racb
submitter_name = Robie Basak
tags = []
duplicates = []

Launchpad user Robie Basak(racb) wrote on 2023-01-17T19:52:55.708270+00:00

As discussed in #ubuntu-server just now, it's expected that cloud-init will ensure that machine-id is not carried over when a VM is cloned and this is detectable by an instance-id change.

This would align behaviour with ssh host key regeneration behaviour.

Actual behaviour: currently if a VM is cloned and the instance-id changes, /etc/machine-id remains the same.

@ubuntu-server-builder ubuntu-server-builder added the launchpad Migrated from Launchpad label May 12, 2023
@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Robie Basak(racb) wrote on 2023-01-17T21:18:27.344924+00:00

While experimenting with this, I found that systemd-networkd uses /etc/machine-id to determine the DHCP client identifier, and dnsmasq reissues the same lease if the client identifier is the same. So starting two cloud images using libvirt with its dnsmasq DHCP support from the same "golden image", without cloud-init resetting /etc/machine-id, results in an IP conflict between those two VMs.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Brett Holman(holmanb) wrote on 2023-01-19T05:08:26.484986+00:00

Agreed, automating this boot-time step seems ideal from an user experience and identity correctness perspective.

Resetting machine-id is currently expected to be done by the image builder at build time. Taking responsibility for this behavior at runtime carries risk that will need to be evaluated and mitigated prior to introduction. This would require all systemd services that use machine-id to be ordered after (or potentially restarted after, if already started) whichever cloud-init service would be responsible for this behavior.

If this behavior is expected to be default in upstream cloud-init, risk is multiplied across distros, since each distro may have different services and ordering.

Also note that resetting machine-id at runtime may cause a slower boot by forcing delayed ordering of services.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Brett Holman(holmanb) wrote on 2023-01-19T21:14:36.810321+00:00

Resetting machine-id at runtime would be a pretty big break from current expectations, and correct implementation would require foreknowledge of services using machine-id that are provided in an image. The potential for bugs due to implementation complexity, potential for boot speed regression caused by services delaying until after machine-id is reset, and expected future burden of such a feature due to changes in services and variation in Ubuntu and other distros makes the perceived risk of this feature outweigh the benefit. These complexity, risk, and potential boot speed issues are not present when machine-id is correctly set at boot time, so I'm hesitant to move forward with this request.

I'll mark this "Won't Fix" for now.

In the meantime, I'd like to point users experiencing the same issue towards our build recommendation[1], specifically the --machine-id option.

[1] https://cloudinit.readthedocs.io/en/latest/reference/cli.html#clean

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Chad Smith(chad.smith) wrote on 2023-01-20T00:27:36.730359+00:00

"it's expected that cloud-init will ensure that machine-id is not carried over when a VM is cloned and this is detectable by an instance-id change."

I'm not sure that statement above is wholly correct.

The instance-id delta is triggered in more cases than just a clone and first instance boot event.

In recent history ~5 years, some clouds trigger instance-id changes for the following events to force cloud-init to reperform all configuration on next boot (or sometimes hotplug NIC configuration):

  • network configuration changes, NIC add/remove
  • user-data changes or vendor-data changes
  • vm clone and cloned image relaunch

Here is systemd's documented stance on machine-id changes per man machine-id:

The machine ID does not change based on local or network configuration
or when hardware is replaced. Due to this and its greater length, it is
a more useful replacement for the gethostid(3) call that POSIX
specifies.

Trying to fold /etc/machine-id regeneration into every instance-id change for cloud-init will be tough to support until we have:

  1. cloud-init grow smarts to perform a comparison of previous cached instance data versus current metadata from the cloud's instance metadata service to determine whether the scope config changes are limited to just network or storage to avoid regenerating the machine-id unnecessarily 2. an assurance that systemd and systemd-networkd can react appropriately to an updated machine-id on the booting system after networkd is already active comes up

The reason for #2 is because cloud-init is only able to detect instance metadata after the network is already active on the system, and restarting systemd-networkd later in boot is more likely to expose a number of other racey problems.

We may take a look at this further, but the conditions under which we want cloud-init to magically regenerate /etc/machine-id and cope with systemd ordering/costs would need to be limited in scope to avoid triggering other concerns.

@ubuntu-server-builder
Copy link
Collaborator Author

Launchpad user Brett Holman(holmanb) wrote on 2023-03-13T16:55:28.868455+00:00

A couple of related details regarding machine-id induced IP collisions:

The duplicate IP caused by duplicate machine-id will not happen with NetworkManager in Focal and later (NetworkManager versions >1.15) by default due to this change[1]. It is still possible to trigger it by setting ipv4.dhcp-client-id=duid.

SystemD is unlikely to follow NetworkManager[2], because doing so would only mask the bigger problem (duplicate machine-id on multiple machines).

Therefore, the duplicate IP symptom is limited to distros using systemd-networkd, however the underlying machine-id issue affects all distros.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/cfd696cc3cf43f5f510046b757949546bcee4cdc
[2] systemd/systemd#9609 (comment)

@ubuntu-server-builder ubuntu-server-builder closed this as not planned Won't fix, can't repro, duplicate, stale May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
launchpad Migrated from Launchpad
Projects
None yet
Development

No branches or pull requests

1 participant