Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.13.3: sundog fails to deserialize settings - invalid hostname #3031

Closed
nairb774 opened this issue Apr 18, 2023 · 3 comments
Closed

1.13.3: sundog fails to deserialize settings - invalid hostname #3031

nairb774 opened this issue Apr 18, 2023 · 3 comments
Assignees
Labels
status/in-progress This issue is currently being worked on type/bug Something isn't working

Comments

@nairb774
Copy link

Image I'm using:

Welcome to Bottlerocket OS 1.13.3 (aws-k8s-1.22)!

What I expected to happen:

1.13.3 will correctly boot and attach to the EKS cluster.

What actually happened:

sundog crashes with a log line similar to the following:

[ 5.385998] sundog[1939]: Error deserializing HashMap to Settings: Error deserializing scalar value: Unable to deserialize into ValidLinuxHostname: Invalid hostname 'ip-10-17-31-27.us-west-2.compute.internal abcdefghijk.com abcdefg.com abcdefgh-abcd.local nopqrstuvwx.com abcdefghijkl.com abcde.net abcdefghijklmnopqrs.com': must only be [0-9a-z.-], and 1-253 chars long

Host name length and shape preserved, values replaced. This crashes in such a way that I'm unable to connect to the box with systems manager to debug further. Thankfully the system log has enough info.

Full Log
�[H�[J�[1;1H�[H�[J�[1;1H�[H�[J�[1;1H  Booting `Bottlerocket OS 1.13.3'


[    0.169397] RETBleed: WARNING: Spectre v2 mitigation leaves CPU vulnerable to RETBleed attacks, data leaks possible!

Welcome to Bottlerocket OS 1.13.3 (aws-k8s-1.22)!

[  OK  ] Created slice Slice /system/modprobe.
[  OK  ] Reached target Path Units.
[  OK  ] Reached target Slice Units.
[  OK  ] Reached target Swaps.
[  OK  ] Listening on Journal Audit Socket.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Listening on Journal Socket.
[  OK  ] Listening on udev Control Socket.
[  OK  ] Listening on udev Kernel Socket.
         Mounting Huge Pages File System...
         Mounting POSIX Message Queue File System...
         Mounting CNI Configuration Directory (/etc/cni)...
         Mounting Kernel Debug File System...
         Mounting Kernel Trace File System...
         Mounting Temporary Directory /tmp...
         Starting Load audit rules...
         Starting Checks and marks if boot has ever succeeded before...
         Starting Create List of Static Device Nodes...
         Starting Load Kernel Module configfs...
         Starting Load Kernel Module efi_pstore...
         Starting Load Kernel Module fuse...
         Starting Copy SELinux policy files...
         Starting Journal Service...
         Starting Load Kernel Modules...
         Starting Generate network units from Kernel command line...
         Starting Remount Root and Kernel File Systems...
         Starting Coldplug All udev Devices...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted CNI Configuration Directory (/etc/cni).
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Mounted Kernel Trace File System.
[  OK  ] Mounted Temporary Directory /tmp.
[  OK  ] Finished Load audit rules.
[  OK  ] Finished Create List of Static Device Nodes.
[  OK  ] Finished Load Kernel Module configfs.
[  OK  ] Finished Load Kernel Module efi_pstore.
[  OK  ] Finished Load Kernel Module fuse.
[  OK  ] Finished Copy SELinux policy files.
[  OK  ] Finished Checks and marks if boot has ever succeeded before.
[  OK  ] Finished Generate network units from Kernel command line.
[  OK  ] Finished Remount Root and Kernel File Systems.
         Mounting Containerd Configuration Directory (/etc/containerd)...
         Mounting Host containers Configurat�irectory (/etc/host-containers)...
         Mounting Kubernetes PKI private dir�y (/etc/kubernetes/pki/private)...
         Mounting AWS configuration directory (/root/.aws)...
         Mounting FUSE Control File System...
         Mounting Kernel Configuration File System...
         Starting Create System Users...
[  OK  ] Finished Load Kernel Modules.
[  OK  ] Mounted Containerd Configuration Directory (/etc/containerd).
[  OK  ] Mounted Host containers Configurati� Directory (/etc/host-containers).
[  OK  ] Mounted Kubernetes PKI private dire�ory (/etc/kubernetes/pki/private).
[  OK  ] Mounted AWS configuration directory (/root/.aws).
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Mounted Kernel Configuration File System.
         Starting Apply Kernel Variables...
[  OK  ] Started Journal Service.
[  OK  ] Finished Create System Users.
         Starting Create Static Device Nodes in /dev...
[  OK  ] Finished Apply Kernel Variables.
[  OK  ] Finished Create Static Device Nodes in /dev.
[  OK  ] Reached target Preparation for Local File Systems.
[  OK  ] Set up automount EFI System Partition Automount.
         Starting Rule-based Manager for Device Events and Files...
[  OK  ] Finished Coldplug All udev Devices.
[  OK  ] Started Rule-based Manager for Device Events and Files.
[  OK  ] Found device Amazon Elastic Block Store 13.
[  OK  ] Found device Amazon Elastic Block Store BOTTLEROCKET-PRIVATE.
[  OK  ] Found device Amazon Elastic Block Store 1.
         Starting Repart preferred data partition...
[  OK  ] Finished Repart preferred data partition.
[  OK  ] Found device Amazon Elastic Block Store BOTTLEROCKET-DATA.
         Starting Prepare Local Filesystem (/local)...
[  OK  ] Stopped Repart preferred data partition.
[  OK  ] Finished Prepare Local Filesystem (/local).
         Mounting Local Directory (/local)...
[  OK  ] Mounted Local Directory (/local).
         Starting Resize Data Partition...
         Mounting Mnt Directory (/mnt)...
         Mounting Opt Directory (/opt)...
         Mounting Var Directory (/var)...
[  OK  ] Mounted Mnt Directory (/mnt).
[  OK  ] Mounted Opt Directory (/opt).
[  OK  ] Mounted Var Directory (/var).
         Mounting Private Directory (/var/lib/bottlerocket)...
         Starting Mask Local Mnt Directory (/local/mnt)...
         Starting Mask Local Opt Directory (/local/opt)...
         Starting Mask Local Var Directory (/local/var)...
         Starting Prepare Opt Directory (/opt)...
         Starting Prepare Containerd Directory (/var/lib/containerd)...
         Starting Prepare Kubelet Directory (/var/lib/kubelet)...
         Starting Prepare Var Directory (/var)...
         Starting Flush Journal to Persistent Storage...
         Starting Load/Save Random Seed...
[  OK  ] Mounted Private Directory (/var/lib/bottlerocket).
[  OK  ] Finished Mask Local Mnt Directory (/local/mnt).
[  OK  ] Finished Mask Local Opt Directory (/local/opt).
[  OK  ] Finished Mask Local Var Directory (/local/var).
[  OK  ] Finished Prepare Opt Directory (/opt).
[  OK  ] Finished Prepare Containerd Directory (/var/lib/containerd).
[  OK  ] Finished Load/Save Random Seed.
[  OK  ] Finished Prepare Kubelet Directory (/var/lib/kubelet).
[  OK  ] Reached target First Boot Complete.
[  OK  ] Finished Prepare Var Directory (/var).
         Mounting CNI Plugin Directory (/opt/cni/bin)...
         Mounting Kernel Modules (Read-Write)...
[  OK  ] Finished Flush Journal to Persistent Storage.
[  OK  ] Mounted CNI Plugin Directory (/opt/cni/bin).
[  OK  ] Mounted Kernel Modules (Read-Write).
[  OK  ] Reached target Local File Systems.
         Mounting Kernel Development Sources (Read-Only)...
         Mounting License files...
         Starting Commit a transient machine-id on disk...
         Starting Create Volatile Files and Directories...
[  OK  ] Finished Commit a transient machine-id on disk.
[  OK  ] Finished Create Volatile Files and Directories.
         Starting Rebuild Dynamic Linker Cache...
         Starting Rebuild Journal Catalog...
[  OK  ] Finished Rebuild Journal Catalog.
[  OK  ] Finished Rebuild Dynamic Linker Cache.
         Starting Update is Completed...
[  OK  ] Finished Update is Completed.
[  OK  ] Reached target System Initialization.
[  OK  ] Started Scheduled Metricdog Pings.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Reached target Timer Units.
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Socket Units.
         Starting D-Bus System Message Bus...
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Reached target Basic System.
         Starting ACPI event daemon...
         Starting Generate network configuration...
         Starting Call signpost to mark the �r all required targets are met....
         Starting Bottlerocket data store migrator...
[  OK  ] Started ACPI event daemon.
[    2.840982] migrator[1851]: Data store does not exist at given path, exiting (/var/lib/bottlerocket/datastore/current)
[  OK  ] Finished Call signpost to mark the �ter all required targets are met..
[  OK  ] Finished Bottlerocket data store migrator.
[  OK  ] Finished Generate network configuration.
[  OK  ] Reached target Preparation for Network.
         Starting Prepare Boot Directory (/boot)...
         Starting Datastore creator...
         Starting wicked DHCPv4 supplicant service...
         Starting wicked DHCPv6 supplicant service...
[  OK  ] Mounted Kernel Development Sources (Read-Only).
[  OK  ] Mounted License files.
[  OK  ] Finished Datastore creator.
         Starting Bottlerocket API server...
[  OK  ] Finished Prepare Boot Directory (/boot).
         Starting Disable kexec load syscalls...
[  OK  ] Finished Resize Data Partition.
[  OK  ] Finished Disable kexec load syscalls.
[  OK  ] Started Bottlerocket API server.
[  OK  ] Started wicked DHCPv4 supplicant service.
[  OK  ] Started wicked DHCPv6 supplicant service.
         Starting wicked network management service daemon...
[  OK  ] Started wicked network management service daemon.
         Starting wicked network nanny service...
[  OK  ] Started wicked network nanny service.
         Starting wicked managed network interfaces...
         Mounting Kernel Development Sources (Read-Write)...
[  OK  ] Mounted Kernel Development Sources (Read-Write).
[  OK  ] Finished wicked managed network interfaces.
[  OK  ] Reached target Network.
[  OK  ] Reached target Network is Online.
         Starting Bottlerocket userdata configuration system...
[  OK  ] Finished Bottlerocket userdata configuration system.
         Starting User-specified setting generators...
[    5.385998] sundog[1939]: Error deserializing HashMap to Settings: Error deserializing scalar value: Unable to deserialize into ValidLinuxHostname: Invalid hostname 'ip-10-17-31-27.us-west-2.compute.internal abcdefghijk.com abcdefg.com abcdefgh-abcd.local nopqrstuvwx.com abcdefghijkl.com abcde.net abcdefghijklmnopqrs.com': must only be [0-9a-z.-], and 1-253 chars long
[FAILED] Failed to start User-specified setting generators.
See 'systemctl status sundog.service' for details.
[DEPEND] Dependency failed for Applies settings to create config files.
[DEPEND] Dependency failed for Send signal to CloudFormation Stack.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Isolates configured.target.
[DEPEND] Dependency failed for Sets the hostname.

How to reproduce the problem:

It looks to be related to configuring the VPC with a number of domains in the domain-name list of the DCHP options:
image

I can't defend the configuration as it stands today (causes other annoyances), but it seems like this is a valid set of configuration per the documentation:

Some Linux operating systems accept multiple domain names separated by spaces. However, other Linux operating systems and Windows treat the value as a single domain, which results in unexpected behavior. If your DHCP option set is associated with a VPC that contains instances that are not all running the same operating systems, specify only one domain name.

@nairb774 nairb774 added status/needs-triage Pending triage or re-evaluation type/bug Something isn't working labels Apr 18, 2023
@etungsten
Copy link
Contributor

etungsten commented Apr 18, 2023

Hi @nairb774,

Thanks for reporting this. We'll need to take a closer look into this. The hostname setting is generated by netdog by querying IMDS for meta-data/local-hostname. Apparently that can potentially returns a list of hostnames based on the list of domain names in the DHCP options?

It's interesting how the hostnames after ip-10-17-31-27.us-west-2.compute.internal do not include the IPv4 address component but just the domain name.

@zmrow zmrow self-assigned this Apr 18, 2023
@nairb774
Copy link
Author

Yea, it looks like the rest of the domains are appended to the search line of the /etc/resolve.conf in containers(1), and I assume on the host system as well. The DHCP settings look to be on a VPC level, and changing them have a broader impact than just the EKS clusters which complicates things, especially when it comes to a bunch of "legacy" systems floating about.

(1) Granted, the list I gave above is so long it runs into other Kubernetes limitations, but thankfully that isn't impacting workloads on the EKS hosts. If anything I'd love to not carry all those extra domains forward so I can try to clean house during migration. I can dream :)

@zmrow zmrow added status/in-progress This issue is currently being worked on and removed status/needs-triage Pending triage or re-evaluation labels Apr 19, 2023
@bcressey
Copy link
Contributor

1.13.4 is out with the backported fix in 5bf14ac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/in-progress This issue is currently being worked on type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants