Skip to content

Conversation

@drzee99
Copy link
Contributor

@drzee99 drzee99 commented Feb 2, 2026

Fix DNS resolution performance regression during cloud-init local

Summary

This PR addresses critical DNS resolution performance issues during the early cloud-init local stage that cause boot delays of 2+ minutes, particularly with systemd version 259 and later.

Problem

  • Boot delays: 2+ minutes (up from <30 seconds) during cloud-init local
  • Root cause: DNS queries for IP addresses during DNS redirect detection
  • Systemd 259 regression: Recent systemd changes make DNS resolution significantly slower during early boot
  • Legacy URL: Hardcoded DNS-dependent metadata URL that's no longer documented by AWS

Solution

1. Optimize IP address handling in util.py

  • Move IP address detection to function start to bypass all DNS operations
  • Remove duplicate IP check that occurred after expensive DNS queries
  • IP addresses now completely avoid DNS redirect detection

2. Remove legacy DNS-dependent URL from DataSourceEc2.py

  • Remove http://instance-data.:8773 which is not in current AWS IMDS documentation
  • Keep only IP-based endpoints that work without DNS resolution

Changes

  • cloudinit/util.py: Early return for IP addresses in is_resolvable()
  • cloudinit/sources/DataSourceEc2.py: Remove legacy DNS-dependent metadata URL

Testing

  • IP addresses return immediately from is_resolvable()
  • Cloud-init local completes in <30 seconds (down from 2+ minutes)
  • IMDS access works without DNS resolution
  • No functional regressions

Related Issues

Fixes #6641 - Systemd version 259 slows down DNS check during cloud-init local

Backward Compatibility

  • ✅ No breaking changes
  • ✅ Maintains all existing functionality
  • ✅ Uses only documented AWS IMDS endpoints

Fixes DNS queries for IP addresses that cause 2+ minute boot delays,
particularly with systemd 259+. Moves IP detection earlier in
is_resolvable() and removes legacy DNS-dependent metadata URL.

Fixes canonical#6641
@blackboxsw blackboxsw self-assigned this Feb 3, 2026
metadata_urls = [
"http://169.254.169.254",
"http://[fd00:ec2::254]",
"http://instance-data.:8773",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On EC2 this resolves to the current IMDS IP address. While this is known to be a specific IP address, there may be some systems that depend on this.

Copy link
Contributor Author

@drzee99 drzee99 Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that keeping this "non-ip address" will trigger a call to is_resolvable() and go through the "expensive" process of checking for DNS ref:

cloudinit/sources/DataSourceEc2.py (line 363)

# Remove addresses from the list that wont resolve.
mdurls = mcfg.get("metadata_urls", self.metadata_urls)
filtered = [x for x in mdurls if util.is_resolvable_url(x)]

"instance-data.:8773" is actually removed from the list (at least on EC2) as it fails the is_resolvable_url() (which in turn calls is_resolvable()) call. We see that both before and after systemd 259 upgrade. From the logs attached to the bug raised:

From pre_259_journal.log:
Dec 26 13:36:19 ip-172-31-27-166 python3[318]: [CLOUDINIT] DataSourceEc2.py[DEBUG]: Removed the following from metadata urls: ['http://instance-data.:8773']

From post_259_journal.log:
Dec 26 13:43:18 ip-172-31-27-166 python3[484]: [CLOUDINIT] DataSourceEc2.py[DEBUG]: Removed the following from metadata urls: ['http://instance-data.:8773']

I also checked the documentation for the "clouds" identified at the beginning of DataSourceEc2.py (line 36, class CloudNames: ...) and none of these refer to "instance-data.", but only "169.254.169.254"

Brightbox: https://www.brightbox.com/docs/reference/metadata-service/
Zstack: https://cloudinit.readthedocs.io/en/24.1/reference/datasources/zstack.html
e24cloud: https://www.e24cloud.com/en/e24cloud-servers/meta-data/
Outscale: https://docs.outscale.com/en/userguide/Accessing-the-Metadata-and-User-Data-of-a-VM.html
Tilaa: https://support.tilaa.com/hc/en-us/articles/228652587-Using-the-VPS-Metadata-Service-169-254-169-254

Aware that there is an "UNKNOWN", if the cloud can not be identified, however I don't think its reasonable to assume that when falling into this category that "instance-data.:8773" is a valid endpoint.

There is also this old bug report (somewhat related): https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2039723

Where it is mentioned that:

"HOWEVER, the name as given in the list above "instance-data." is trying to do the "hostnames ending in a '.' are fully qualified" thing, but in fact that name in AWS is not fully qualified. Instead, it requires the AWS region-specific local domain be appended"

Which indicates that "instance-data.:8773" will (at least on EC2) always fail DNS resolution - no matter what.

It also theorized later on:

"(I wonder if this was originally done to support EC2-Classic? Detection of Classic instances is handled elsewhere, and AWS dropped supported for Classic networking in 2022 having migrated all such instances to a VPC. So if "instance-data." is a remnant of that era, it should be migrated also by removing the trailing dot.)"

This indicates that "instance-data.:8773" most likely is a relic that can be safely removed.

That being said, if reviewers still feel it should not be removed, it can be left in. There is a work around as it is possible to provide a "metadata_urls" list through the cloud-init config files that does not contain "instance-data.:8773" - ref.: https://cloudinit.readthedocs.io/en/latest/reference/datasources/ec2.html

It still requires the change to is_resolvable() to exit early on IP addresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this, @drzee99. I think that I am comfortable with removing it.

@drzee99 drzee99 requested a review from holmanb February 5, 2026 08:34
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @drzee99!

@holmanb holmanb merged commit 72809f8 into canonical:main Feb 5, 2026
22 checks passed
holmanb pushed a commit that referenced this pull request Feb 5, 2026
Fixes DNS queries for IP addresses that cause 2+ minute boot delays
with systemd 259+. Moves IP detection earlier in is_resolvable() and
removes legacy DNS-dependent metadata URL.

Fixes GH-6641
holmanb pushed a commit that referenced this pull request Feb 5, 2026
Fixes DNS queries for IP addresses that cause 2+ minute boot delays
with systemd 259+. Moves IP detection earlier in is_resolvable() and
removes legacy DNS-dependent metadata URL.

Fixes GH-6641
holmanb pushed a commit that referenced this pull request Feb 6, 2026
Fixes DNS queries for IP addresses that cause 2+ minute boot delays
with systemd 259+. Moves IP detection earlier in is_resolvable() and
removes legacy DNS-dependent metadata URL.

Fixes GH-6641
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Systemd version 259 slows down DNS check during cloud-init local

3 participants