Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined meta-data can create malformed YAML #13853

Open
3 tasks done
holmanb opened this issue Aug 1, 2024 · 23 comments
Open
3 tasks done

User-defined meta-data can create malformed YAML #13853

holmanb opened this issue Aug 1, 2024 · 23 comments
Labels
Bug Confirmed to be a bug Feature New feature, not a bug
Milestone

Comments

@holmanb
Copy link
Member

holmanb commented Aug 1, 2024

Required information

  • Distribution: Ubuntu Noble
  • The output of "snap list --all lxd core20 core22 core24 snapd":
Name    Version      Rev    Tracking       Publisher   Notes
core20  20240227     2264   latest/stable  canonical✓  base,disabled
core20  20240416     2318   latest/stable  canonical✓  base
core22  20240111     1122   latest/stable  canonical✓  base,disabled
core22  20240408     1380   latest/stable  canonical✓  base
lxd     6.1-0d4d89b  29469  latest/stable  canonical✓  disabled
lxd     6.1-c14927a  29551  latest/stable  canonical✓  -
snapd   2.62         21465  latest/stable  canonical✓  snapd,disabled
snapd   2.63         21759  latest/stable  canonical✓  snapd
  • The output of "lxc info" or if that fails:
config: {}
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- init_preseed_storage_volumes
- metrics_instances_count
- server_instance_type_info
- resources_disk_mounted
- server_version_lts
- oidc_groups_claim
- loki_config_instance
- storage_volatile_uuid
- import_instance_devices
- instances_uefi_vars
- instances_migration_stateful
- container_syscall_filtering_allow_deny_syntax
- access_management
- vm_disk_io_limits
- storage_volumes_all
- instances_files_modify_permissions
- image_restriction_nesting
- container_syscall_intercept_finit_module
- device_usb_serial
- network_allocate_external_ips
- explicit_trust_token
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: holmanb
auth_user_method: unix
environment:
  addresses: []
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB+zCCAYCgAwIBAgIQdDx+LXwGuHE6lUh7Eidt5jAKBggqhkjOPQQDAzAxMRww
    GgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMREwDwYDVQQDDAhyb290QGFyYzAe
    Fw0yMTEwMjExNDE4MzlaFw0zMTEwMTkxNDE4MzlaMDExHDAaBgNVBAoTE2xpbnV4
    Y29udGFpbmVycy5vcmcxETAPBgNVBAMMCHJvb3RAYXJjMHYwEAYHKoZIzj0CAQYF
    K4EEACIDYgAEZVKG/5oSol3bL/KYIaIag7xM7QEAUe0KsNcW44JNMRWWjKEC1bYy
    RPf7dabQywL2pNeiWYUPpXtEzQEMthpCrFH1tYWwCxbab0I8xXP5nio+qyEoZ76B
    qIwept8PNb9xo10wWzAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUH
    AwEwDAYDVR0TAQH/BAIwADAmBgNVHREEHzAdggNhcmOHBH8AAAGHEAAAAAAAAAAA
    AAAAAAAAAAEwCgYIKoZIzj0EAwMDaQAwZgIxAMLGwnrmbcb2QpQusAGqqYR7/tri
    dnZFXK0w7sbpndc+9XMuoKpEf9VOVCh90EQtdgIxAOJeO3egwenHJ9S4CVyrK0ON
    lKbu/QQBW0XJ77VVIKKP/OIOyAIJXncOkOxip5XMEQ==
    -----END CERTIFICATE-----
  certificate_fingerprint: 78d858acdbbb797d36863a910368bc41311b2c5eb1c3b11287c0966c7f58c962
  driver: qemu | lxc
  driver_version: 8.2.1 | 6.0.0
  instance_types:
  - virtual-machine
  - container
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.8.0-38-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "24.04"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: arc
  server_pid: 172560
  server_version: "6.1"
  server_lts: false
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: cephfs
    version: 17.2.7
    remote: true
  - name: cephobject
    version: 17.2.7
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.48.0
    remote: false
  - name: powerflex
    version: 1.16 (nvme-cli)
    remote: true
  - name: zfs
    version: 2.2.2-0ubuntu9
    remote: false
  - name: btrfs
    version: 5.16.2
    remote: false
  - name: ceph
    version: 17.2.7
    remote: true

Issue description

Problem

The user-defined meta-data key gets appended as a string to the lxd-provided meta-data. This means that duplicate keys can be added, which creates a configuration that isn't well defined. Both 1.1 and 1.2 of the YAML spec state that keys are unique, which this violates.

The configuration received by cloud-init:

{'_metadata_api_version': '1.0',
 'config': {'user.meta-data': 'instance-id: test_2'},
 'devices': {'eth0': {'hwaddr': '00:16:3e:e3:ed:2c',
                      'name': 'eth0',
                      'network': 'lxdbr0',
                      'type': 'nic'},
             'root': {'path': '/', 'pool': 'default', 'type': 'disk'}},
 'meta-data': '#cloud-config\n'
              'instance-id: 0b6c31e2-403c-44eb-b610-ad7eafea777e\n'
              'local-hostname: oracular\n'
              'instance-id: test_2'}

Cloud-init's implementation uses PyYAML which happens to use the last defined key - which happens to produce the desired outcome (allow user to override the default meta-data), but it depends on undefined behavior of a specific library. If cloud-init were ever to move to a different YAML library this behavior could break or need to be manually worked around.

In order to preserve the current behavior while creating a path to using standard-compliant yaml while preserving backwards compatibility, we could do the following:

  1. cloud-init could be updated to make values in metadata['config']['user.meta-data'] override values in metadata['meta-data']. This wouldn't change cloud-init's current behavior, which ignores the values in metadata['config']. We could optionally check for a bump to the value in _metadata_api_version before doing this, but this wouldn't be strictly required since this is functionally identical currently.

  2. Once stable distributions have this update, we could update the api to no longer append user meta-data to the default metadata (and bump the meta-data api, if desired). While we're making this change, we might want to drop the #cloud-config comment too. This isn't necessary because meta-data isn't part of cloud-config.

canonical/cloud-init#5575

Information to attach

  • Container log (lxc info NAME --show-log)
Name: cloudinit-0801-1919380a56vdl6
Status: RUNNING
Type: container
Architecture: x86_64
PID: 1040232
Created: 2024/08/01 13:19 MDT
Last Used: 2024/08/01 13:42 MDT

Resources:
  Processes: 69
  CPU usage:
    CPU usage (in seconds): 6
  Memory usage:
    Memory (current): 83.53MiB
    Swap (current): 28.00KiB
  Network usage:
    eth0:
      Type: broadcast
      State: UP
      Host interface: vethd9b8b75f
      MAC address: 00:16:3e:9a:8b:f6
      MTU: 1500
      Bytes received: 115.82kB
      Bytes sent: 5.29kB
      Packets received: 454
      Packets sent: 52
      IP addresses:
        inet:  10.161.80.194/24 (global)
        inet6: fd42:80e2:4695:1e96:216:3eff:fe9a:8bf6/64 (global)
        inet6: fe80::216:3eff:fe9a:8bf6/64 (link)
    lo:
      Type: loopback
      State: UP
      MTU: 65536
      Bytes received: 404B
      Bytes sent: 404B
      Packets received: 4
      Packets sent: 4
      IP addresses:
        inet:  127.0.0.1/8 (local)
        inet6: ::1/128 (local)

Log:

lxc cloudinit-0801-1919380a56vdl6 20240801194228.855 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194228.855 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194228.857 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194228.857 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194243.782 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194243.782 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194243.795 ERROR    attach - ../src/src/lxc/attach.c:lxc_attach_run_command:1841 - No such file or directory - Failed to exec "user.meta-data"
lxc cloudinit-0801-1919380a56vdl6 20240801194325.518 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194325.518 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194417.803 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801194417.803 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801195046.604 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801195046.604 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801201625.883 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing
lxc cloudinit-0801-1919380a56vdl6 20240801201625.883 WARN     idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing
  • Container configuration (lxc config show NAME --expanded)
architecture: x86_64
config:
  image.architecture: x86_64
  image.description: Ubuntu 20.04 LTS server (20240730)
  image.os: ubuntu
  image.release: focal
  limits.cpu.allowance: 50%
  user.meta-data: 'instance-id: test_2'
  volatile.base_image: c19cc6a8469b596aae092a3953e326ed01e1183a25bff1d26145a85a2272767e
  volatile.cloud-init.instance-id: 7d26c435-da56-405c-9b04-9ad98f550736
  volatile.eth0.host_name: vethd9b8b75f
  volatile.eth0.hwaddr: 00:16:3e:9a:8b:f6
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: a097111b-15e4-45e4-aa31-a6da707012a8
  volatile.uuid.generation: a097111b-15e4-45e4-aa31-a6da707012a8
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
time="2024-07-22T10:13:23-06:00" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority will be ignored. Please use per-device limits.priority instead"
time="2024-07-31T07:48:45-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:49:29-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:50:07-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:50:07-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:50:26-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
time="2024-07-31T07:50:33-06:00" level=warning msg="Skipping AppArmor for dnsmasq due to raw.dnsmasq being set" driver=bridge name=lxdbr0 network=lxdbr0 project=default
@tomponline
Copy link
Member

Hi @holmanb

I'm afraid im not really following what it is that LXD needs to change here?

Also, not sure if relevant, but using the user.* prefix is deprecated for cloud-init config and the current support keys start with cloud-init., see https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-options-cloud-init

@tomponline tomponline added the Incomplete Waiting on more information from reporter label Aug 2, 2024
@holmanb
Copy link
Member Author

holmanb commented Aug 2, 2024

Thanks for the response @tomponline!

I'm afraid im not really following what it is that LXD needs to change here?

This is the offending line. See the commit on this branch for the change that I am proposing.

I'm happy to submit a PR for this, but we need to release a change in cloud-init first to accommodate this expectation. This is why I filed a bug report rather than just a PR - I want to make sure that the proposed solution is acceptable before moving forward it.

Also, not sure if relevant, but using the user.* prefix is deprecated for cloud-init config and the current support keys start with cloud-init., see https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-options-cloud-init

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

$ lxc launch ubuntu:noble me -c cloud-init.meta-data=instance-'id: test_1'
Creating me
Error: Failed instance creation: Failed creating instance record: Unknown configuration key: cloud-init.meta-data
$ lxc launch ubuntu:noble me -c user.meta-data=instance-'id: test_1'
Creating me
Starting me                               

similarly:

$ lxc config set me user.meta-data=instance-'id: test_1'    
$ lxc config set me cloud-init.meta-data=instance-'id: test_1'
Error: Invalid config: Unknown configuration key: cloud-init.meta-data

If you want to deprecate the user.meta-data key as well for uniformity I could potentially make cloud-init support a new cloud-init.meta-data key while making this change. Let me know.

@tomponline
Copy link
Member

I'm happy to submit a PR for this, but we need to release a change in cloud-init first to accommodate this expectation. This is why I filed a bug report rather than just a PR - I want to make sure that the proposed solution is acceptable before moving forward it.

Thanks!

Will this break users of LXD guests with older versions of cloud-init?

@tomponline
Copy link
Member

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

Hrm, that is curious, I wasn't expecting that, but I'd need to dig into the commit history and original pull requests to try and understand why this wasn't originally changed to have a cloud-init. prefix like the other keys, as it seems to be like it should.

@tomponline
Copy link
Member

tomponline commented Aug 5, 2024

This isn't necessary because meta-data isn't part of cloud-config.

Please could you explain this statement. I'm confused why a key being used by cloud-init isn't part of cloud-config?

@tomponline
Copy link
Member

In order to preserve the current behavior while creating a path to using standard-compliant yaml while preserving backwards compatibility, we could do the following:

1. cloud-init could be updated to make values in `metadata['config']['user.meta-data']` override values in `metadata['meta-data']`. This wouldn't change cloud-init's current behavior, which ignores the values in `metadata['config']`. We could optionally check for a bump to the value in `_metadata_api_version` before doing this, but this wouldn't be strictly required since this is functionally identical currently.

2. Once stable distributions have this update, we could update the api to no longer append user meta-data to the default metadata (and bump the meta-data api, if desired). While we're making this change, we might want to drop the `#cloud-config` comment too. This isn't necessary because meta-data isn't part of cloud-config.

I suspect we'll need option 1. at least, and then potentially land the proposed changed in 2. for only the 6.x series of LXD.

@holmanb
Copy link
Member Author

holmanb commented Aug 5, 2024

Will this break users of LXD guests with older versions of cloud-init?

This would break any user that provides a custom instance-id (duplicate key) on an older version of cloud-init, since this would cause cloud-init to see the old key where it didn't before.

From a cloud-init perspective, fixes for bugs come in new releases so the typical stability / support recommendation is "upgrade to the latest version". If we want to avoid breaking old instances, I could probably update the proposal I made above to increment the api rev number.

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

Hrm, that is curious, I wasn't expecting that, but I'd need to dig into the commit history and original pull requests to try and understand why this wasn't originally changed to have a cloud-init. prefix like the other keys, as it seems to be like it should.

Agreed. Let me know if you'd like to go that route.

This isn't necessary because meta-data isn't part of cloud-config.

Please could you explain this statement. I'm confused why a key being used by cloud-init isn't part of cloud-config?

Cloud-config isn't required for any of the keys: vendor-data, user-data, or meta-data.

Cloud-config is just one of cloud-init's configuration formats. There are several configuration format options available for user-data and vendor-data, including cloud-config, and even just running a shell script:

config:
...
  user.user-data: |
    #!/usr/bin/bash
    echo hello | tee -a /tmp/example.txt

With the above example a user would see:

$ lxc exec me -- cat /tmp/example.txt
hello

User-data is provided by the user for the purpose of configuring an instance. Vendor-data is likewise intended to by provided by the cloud/vendor for the purpose of configuring an instance with cloud-specific information. Both vendor-data and user-data can be any of the multiple configuration formats mentioned above.

Meta-data doesn't follow any of the above formats, and is not intended to be a configuration format for the instance. Instead, it supposed to tell cloud-init just a few pieces of information about the instance: its instance_id, region, etc. The lines are blurred a bit because a couple of the keys that it supports overlap with cloud-config. One of the overlapping keys is local-hostname, which is used by lxd and probably adds to the confusion here. Neither key is defined in cloud-init's cloud-config schema.

I suspect we'll need option 1. at least, and then potentially land the proposed changed in 2. for only the 6.x series of LXD.

That sounds fine by me. Let me know if my responses here or further digging revealed anything new that suggest that we shouldn't go forward with this proposal. This PR is my proposal to option 1, if you'd like to take a look.

@tomponline
Copy link
Member

@holmanb Hi, would you mind booking a meeting to discuss this issue? Thanks

@holmanb
Copy link
Member Author

holmanb commented Sep 17, 2024

@holmanb Hi, would you mind booking a meeting to discuss this issue? Thanks

I just saw this when checking back on the status of this. I'd be happy to.

@tomponline
Copy link
Member

tomponline commented Sep 25, 2024

Thanks for the call @holmanb

As discussed, you can change the instance-id exposed to cloud-init via LXD's devlxd metadata API (https://documentation.ubuntu.com/lxd/en/latest/dev-lxd/#meta-data) by changing volatile.cloud-init.instance-id see:

https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-volatile:volatile.cloud_init.instance-id

To change local-hostname rename the instance.

I also think we should entirely remove the user.meta-data key from LXD's code base, as it is currently undocumented and appears to have been due to be removed in LXD 4.21 but was not, apparently due to an oversight:

I believe there is also a user.meta-data config key which is tied to cloud-init. Did we just forget to mention it here and in the issue, or must this remain as user.meta-data?

We will not keep that configuration key moving forward. It’s always been a very odd one with no real use cases, so it will just go away completely.

https://discuss.linuxcontainers.org/t/lxd-first-class-cloud-init-support/12559/18

See also
https://discuss.linuxcontainers.org/t/lxd-4-21-has-been-released/12860#reworked-cloud-init-support-4

Removed from docs here:

As far as I understood, it's because there's no reason for using it - the user.meta-data was originally added to set the instance name, but that isn't necessary anymore (and also doesn't work).

#11433 (comment)

There is also an issue confirming its removal here (although there's some confusion between user.user-data and user.meta-data in that thread):

#10417

@tomponline tomponline added Bug Confirmed to be a bug and removed Incomplete Waiting on more information from reporter labels Sep 25, 2024
@tomponline tomponline added this to the lxd-6.2 milestone Sep 25, 2024
@holmanb
Copy link
Member Author

holmanb commented Sep 25, 2024

Thanks @tomponline for discussing. The volatile key and instance rename should meet our needs.

Cloud-init has one test which I recently added which depends on setting the instance ID via the user.meta-data key. I will update that to use the volatile key later today; it is a trivial change.

I just submitted a PR against cloud-init to update cloud-init's lxd documentation per our conversation.

@blackboxsw
Copy link

blackboxsw commented Sep 25, 2024

@holmanb @tomponline we have a second use case for the user of user.meta-data in integration testing of lxd which allows cloud-init to inject default SSH public-keys configuration into all images launched in a profile without colliding or being overwritten with cloud-init.user-data provided to a system at launch. This now undocumented feature which LXD provides in user.meta-data is reminiscent of the behavior that clouds like Azure, ec2, openstack have which allows project owners or teams to set per-project ssh-public-keys that are authorized for SSH into those vms. If user.meta-data goes away, then minimally integration test runners for Ubuntu Pro and cloud-init will force those tests requiring SSH to use cloud-init.user-data or cloud-init.vendor-data to setup such authorized keys.

If the ability to set user.meta-data disappears in the future, I wonder whether there should be a feature-request instead for lxc config key public-ssh-keys within profile configuration, such a feature would be easy to plumb through to cloud-init based images via 1.0/meta-data in devlxd, but likely complex for images without cloud-init.

@holmanb
Copy link
Member Author

holmanb commented Sep 25, 2024

@blackboxsw thanks for catching that, I didn't catch that platform ssh keys can be provided in meta-data.

For completeness I double checked the other potential users of meta-data. Here are the references to arbitrary meta-data keys that I see in cloudinit/sources/__init__.py:

cloud-name - allows the cloud to define its own cloud-id at runtime
launch-index - used by clouds that need user-data filtered by launch-index (ec2)
availability-zone / availability_zone / placement - used for setting mirrors by region (ec2 and some other clouds)

None of these appear to be used by our LXD datasource code, nor by any of our tests in cloud-init or pycloudlib, so it looks to me like public-ssh-keys is the only requirement blocking lxd from removing user.meta-data.

@tomponline My apologies, I missed this requirement. It seems that user.meta-data is still needed by cloud-init for the time being.

As @blackboxsw suggested, if lxd were to provide the ssh key some other way (such as with a new key volatile.cloud_init.public-keys similar to the volatile.cloud_init.instance-id key), then cloud-init could switch to use that and stop using the user.meta-data key.

@tomponline
Copy link
Member

what form would volatile.cloud_init.public-keys take?

@holmanb
Copy link
Member Author

holmanb commented Sep 26, 2024

what form would volatile.cloud_init.public-keys take?

Regarding applying these settings, we would need to be able to be set this value in a profile (preferred) or on an instance before launch (less preferred, but workable). If I understand correctly, both of these expectations are true for the other volatile keys.

Regarding the datatype, it would be best if this key could contain both string and list of strings. That will ensure that users can continue inserting either a single key or multiple public keys. If only one datatype is preferred, that would probably be fine too (as just a list of strings) but would require some changes in pycloudlib to accomidate.

Regarding the upgrade path, we could make cloud-init and pycloudlib fall back to setting user.meta-data in the event that setting volatile.cloud_init.public-keys fails. This would bridge the gap between old and new versions for seamless rollout.

@blackboxsw
Copy link

blackboxsw commented Sep 26, 2024

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If the value were a string that contained newlines between ssh public keys, that'd work just fine and cloud-init will call splitlines() on that value.

For example the following multi-line string value would allow cloud-init import my two public keys

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSL7uWGj8cgWyIOaspgKdVy0cKJ+UTjfv7jBOjG2H/GN8bJVXy72XAvnhM0dUM+CCs8FOf0YlPX+Frvz2hKInrmRhZVwRSL129PasD12MlI3l44u6IwS1o/W86Q+tkQYEljtqDOo0a+cOsaZkvUNzUyEXUwz/lmYa6G4hMKZH4NBj7nbAAF96wsMCoyNwbWryBnDYUr6wMbjRR1J9Pw7Xh7WRC73wy4Va2YuOgbD3V/5ZrFPLbWZW/7TFXVrql04QVbyei4aiFR5n//GvoqwQDNe58LmbzX/xvxyKJYdny2zXmdAhMxbrpFQsfpkJ9E/H5w0yOdSvnWbUoG5xNGoOB csmith@fringe # ssh-import-id lp:chad.smith
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDvl3VfPjVXsXBsm6r2J+UneIMr4ZOJhQlXuBWTwzexbd/XugB3k5EXA18yyqjEVT+bApVwlxATY66drVUPBuZ2JMU1HuLOKhG6toZd7j042oV5b2TEvg0es9qxs9mtGzvMPf3mB3tBVY/ESall023M+J5JjGGSO4J3zM/9c+P3Hs7xyCjAoySZDN2VZzscPgSGZzck8xtyO39uPfscKXi9LJkkhDDG6SVWie5OeM8TxyH2W2eNDKeXid/qgdIxqRLSYiNnWpt9htI0SzahnFYtsw9VLkij+0cM29lBIGUr5AehN2Y6jetxODR3pZt4YqOiyC6D5NaEsVGKOb0zjIBBCso6mIseejlOwocSYUH21YnLDS2Mu31bHRmPjpRvMVTOFtnS2OkfOxYTyMNFZ5PH/a0/t3DGxZZqz74F+APxG1X0vsgSFA9yYzbBaY3fr3vNAEYsRMTeBIjF6Gx6QmX3/kw5KBid4t8qQCV4Z1l8UmWZu4qFYxV/Z0IYPZazgYy/1W0qfRm5AdvpDdH9XArIokwqe1E2Djp5/xWp4Z9dAINmfJvNZxiDJk7gQz+Hdka/1U/f3wQSds9OAjF+a94Lj+F9CmMrhpVEZG5OL8ysK4iwSOsDhW7iLeZw5AO7cVhDUWj53/p2FP4+zxin/tYkDhNTJF0Nhc2uLMLxRCOGrQ== csmith@uptown # ssh-import-id lp:chad.smith

Simplistically, I'm imagining something like this

--- a/lxd/devlxd.go
+++ b/lxd/devlxd.go
@@ -145,7 +145,7 @@ func devlxdMetadataGetHandler(d *Daemon, inst instance.Instance, w http.Response
 
        value := inst.ExpandedConfig()["user.meta-data"]
 
-       return response.DevLxdResponse(http.StatusOK, fmt.Sprintf("#cloud-config\ninstance-id: %s\nlocal-hostname: %s\n%s", inst.CloudInitID(), inst.Name(), value), "raw", inst.Type() == instancetype.VM)
+       return response.DevLxdResponse(http.StatusOK, fmt.Sprintf("#cloud-config\ninstance-id: %s\nlocal-hostname: %s\n%s%s", inst.CloudInitID(), inst.Name(), inst.CloudInitPublicKeys(), value), "raw", inst.Type() == instancetype.VM)
 }
 
 var devlxdEventsGet = devLxdHandler{
diff --git a/lxd/instance/drivers/driver_common.go b/lxd/instance/drivers/driver_common.go
index 9d547032a8..adc124451e 100644
--- a/lxd/instance/drivers/driver_common.go
+++ b/lxd/instance/drivers/driver_common.go
@@ -171,6 +171,14 @@ func (d *common) CloudInitID() string {
        return d.name
 }
 
+// CloudInitPublicKeys returns a string containing a new-line separated list of SSH authorized keys to configure for an instance
+func (d *common) CloudInitPublicKeys() string {
+       id := d.LocalConfig()["volatile.cloud-init.public-keys"]
+       if id != "":
+            id = fmt.Sprintf("public-keys: %s\n", id)
+       return id
+}
+
 // Location returns instance's location.
 func (d *common) Location() string {
        return d.node

@holmanb
Copy link
Member Author

holmanb commented Sep 26, 2024

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If it were a string that contained newlines between keys, that'd work just fine and cloud-init will call splitlines() on that value.

I prefer if we can avoid receiving structured inputs which then require additional parsing. Unnecessary parsing inevitably leads to bugs and introduces corner cases. If a single format type is preferred, I would lean slightly towards a list of strings. This would benefit not just cloud-init with a simpler implementation, but also users because of more correct validation of inputs.

@blackboxsw
Copy link

blackboxsw commented Sep 26, 2024

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If it were a string that contained newlines between keys, that'd work just fine and cloud-init will call splitlines() on that value.

I prefer if we can avoid receiving structured inputs which then require additional parsing. Unnecessary parsing inevitably leads to bugs and introduces corner cases. If a single format type is preferred, I would lean slightly towards a list of strings. This would benefit not just cloud-init with a simpler implementation, but also users because of more correct validation of inputs.

List of strings sounds good and can easily be validated in lxd/instance/drivers/driver_common.go and be presented as YAML list expected by cloud-init's DataSourceLXD meta-data processing of public-keys. I also note that the leading #cloud-config in existing lxd devlxdMetadataGetHandler doesn't need the leading #cloud-config (as it's not cloud-config and the comment header line is ignored for meta-data anyway).

@holmanb
Copy link
Member Author

holmanb commented Sep 26, 2024

Related: I just submitted a PR because volatile.cloud_init.instance-id should actually be volatile.cloud-init.instance-id.

@tomponline
Copy link
Member

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If it were a string that contained newlines between keys, that'd work just fine and cloud-init will call splitlines() on that value.

I prefer if we can avoid receiving structured inputs which then require additional parsing. Unnecessary parsing inevitably leads to bugs and introduces corner cases. If a single format type is preferred, I would lean slightly towards a list of strings. This would benefit not just cloud-init with a simpler implementation, but also users because of more correct validation of inputs.

List of strings sounds good and can easily be validated in lxd/instance/drivers/driver_common.go and be presented as YAML list expected by cloud-init's DataSourceLXD meta-data processing of public-keys. I also note that the leading #cloud-config in existing lxd devlxdMetadataGetHandler doesn't need the leading #cloud-config (as it's not cloud-config and the comment header line is ignored for meta-data anyway).

So LXD config options have no concept of "list of strings", all config options are a single string.

However they can contain commas, newlines etc, so depending on the expected content of the string, selecting an appropriate delimiter is important (i.e can commas appear in SSH keys?).

If the format is well understood by LXD, then we can validate it, split it and deliver it to cloud-init in the desired format (i.e list of strings).

I would like avoid having an undefined blob of data like the current meta-data setting is as it leads to the issues we've found where the format is not well understood in all situations.

Regarding the upgrade path, we could make cloud-init and pycloudlib fall back to setting user.meta-data in the event that setting volatile.cloud_init.public-keys fails. This would bridge the gap between old and new versions for seamless rollout.

Sounds good.

, we would need to be able to be set this value in a profile (preferred) or on an instance before launch (less preferred, but workable).

volatile keys can only be set on the instance, not the profile, so I would suggest adding a new proper config key, such as security.ssh-keys.

Interestingly we've recently received a request for something similar from elsewhere in Canonical, albeit it without requiring cloud-init (LXD would set up the SSH keys in the guest). So if we did add a config key like this, and LXD set up the keys directly, would this data even need to be exported to cloud-init's metadata?

Ofcourse we could do both, but would they then potentially conflict?

cc @mionaalex

@holmanb
Copy link
Member Author

holmanb commented Sep 27, 2024

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If it were a string that contained newlines between keys, that'd work just fine and cloud-init will call splitlines() on that value.

I prefer if we can avoid receiving structured inputs which then require additional parsing. Unnecessary parsing inevitably leads to bugs and introduces corner cases. If a single format type is preferred, I would lean slightly towards a list of strings. This would benefit not just cloud-init with a simpler implementation, but also users because of more correct validation of inputs.

List of strings sounds good and can easily be validated in lxd/instance/drivers/driver_common.go and be presented as YAML list expected by cloud-init's DataSourceLXD meta-data processing of public-keys. I also note that the leading #cloud-config in existing lxd devlxdMetadataGetHandler doesn't need the leading #cloud-config (as it's not cloud-config and the comment header line is ignored for meta-data anyway).

So LXD config options have no concept of "list of strings", all config options are a single string.

Good to know.

However they can contain commas, newlines etc, so depending on the expected content of the string, selecting an appropriate delimiter is important (i.e can commas appear in SSH keys?).

If the format is well understood by LXD, then we can validate it, split it and deliver it to cloud-init in the desired format (i.e list of strings).

That could work, and I think as @blackboxsw suggested new line delimited would be fine. I think that we would just want to avoid passing empty strings, which might get introduced for example when a user uses two newlines between keys rather than one newline.

I would like avoid having an undefined blob of data like the current meta-data setting is as it leads to the issues we've found where the format is not well understood in all situations.

Agreed

Regarding the upgrade path, we could make cloud-init and pycloudlib fall back to setting user.meta-data in the event that setting volatile.cloud_init.public-keys fails. This would bridge the gap between old and new versions for seamless rollout.

Sounds good.

, we would need to be able to be set this value in a profile (preferred) or on an instance before launch (less preferred, but workable).

volatile keys can only be set on the instance, not the profile, so I would suggest adding a new proper config key, such as security.ssh-keys.

Sounds good. As described above, it wouldn't be used internally anyways so volatile doesn't make as much sense.

Interestingly we've recently received a request for something similar from elsewhere in Canonical, albeit it without requiring cloud-init (LXD would set up the SSH keys in the guest). So if we did add a config key like this, and LXD set up the keys directly, would this data even need to be exported to cloud-init's metadata?

Ofcourse we could do both, but would they then potentially conflict?

cc @mionaalex

I think that it would probably be preferred if we can exercise this code path in cloud-init using LXD, but I do think that doing both would conflict. Maybe we could get away with just testing this functionality on other clouds. @blackboxsw thoughts?

@tomponline
Copy link
Member

I think that it would probably be preferred if we can exercise this code path in cloud-init using LXD, but I do think that doing both would conflict. Maybe we could get away with just testing this functionality on other clouds. @blackboxsw thoughts?

Why are there multiple ways of setting up SSH keys in cloud-init via both meta-data and user-data?

@tomponline
Copy link
Member

@blackboxsw does cloud-init apply the existing user.meta-data on every boot or only first boot?

@tomponline tomponline added the Feature New feature, not a bug label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug Feature New feature, not a bug
Projects
None yet
Development

No branches or pull requests

3 participants