LXD refuses to start if kernel does not support AIO #228

stgraber · 2021-09-10T15:38:58Z

Originally reported at https://github.com/lxc/lxd/issues/9189

I had to compile my own kernel for various reasons, and while I was doing so I figured to harden it a little. Among other things, I decided to disable AIO because it's typically defined as legacy and has had many security issues in the past (besides, io_uring is the new kid on the block).

However, once running on this kernel, 'lxd init --preseed' no longer works because lxd won't start. In the logs the following stack trace is found:

Sep 02 18:46:52 hpv1 lxd.daemon[122708]: panic: runtime error: invalid memory address or nil pointer dereference
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xd08 pc=0x40d56e]
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: goroutine 1 [running]:
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite/internal/bindings._Cfunc_GoString(...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         _cgo_gotypes.go:102
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite/internal/bindings.NewNode(0x1, 0x183fb84, 0x1, 0xc0001a1050, 0x28, 0x0, 0x0, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/canonical/go-dqlite/internal/bindings/server.go:127 +0x136
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite.New(0x1, 0x183fb84, 0x1, 0xc0001a1050, 0x28, 0xc000a170c0, 0x1, 0x1, 0x8, 0x203000, ...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/canonical/go-dqlite/node.go:70 +0xc5
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/lxc/lxd/lxd/cluster.(*Gateway).init(0xc0002ea2a0, 0xc000c29100, 0xc000c70b40, 0xc000458640)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/cluster/gateway.go:811 +0x471
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/lxc/lxd/lxd/cluster.NewGateway(0xc000316090, 0xc0002e4000, 0xc0002cac30, 0xc000a175b0, 0x2, 0x2, 0xc00019d350, 0x2, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/cluster/gateway.go:66 +0x1d7
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*Daemon).init(0xc00014e680, 0xc0001af1b8, 0xc00014e680)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/daemon.go:962 +0x12ce
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*Daemon).Init(0xc00014e680, 0xc0003ec120, 0xc00014e680)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/daemon.go:707 +0x2f
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*cmdDaemon).Run(0xc0001af110, 0xc000035400, 0xc0002cc9c0, 0x0, 0x4, 0x0, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/main_daemon.go:67 +0x36f
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).execute(0xc000035400, 0xc0001c4010, 0x4, 0x4, 0xc000035400, 0xc0001c4010)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:856 +0x472
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000035400, 0xc00049df58, 0x1, 0x1)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:974 +0x375
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).Execute(...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:902
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.main()
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/main.go:218 +0x1af7
Sep 02 18:46:53 hpv1 lxd.daemon[122568]: => LXD failed to start

If it is intended for AIO to be required, my suggestion would be to document this (the only reference to AIO I could find is a recommendation for aio-max-nr sysctl) and maybe add some error handling so LXD doesn't just panic.

Required information

Distribution: Ubuntu 20.04.1 LTS
The output of "lxc info":

# lxc info
config:
  cluster.https_address: 'snip'
  core.https_address: 'snip'
  core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - 'snip'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
snip
    -----END CERTIFICATE-----
  certificate_fingerprint: 1420snip062
  driver: qemu | lxc
  driver_version: 6.1.0 | 4.0.10
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.133-formicidae20210902165706
  lxc_features:
    cgroup2: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: true
  server_name: yes
  server_pid: 12644
  server_version: "4.17"
  storage: zfs
  storage_version: 2.1.0-1
  storage_supported_drivers:
  - name: zfs
    version: 2.1.0-1
    remote: false
  - name: ceph
    version: 15.2.13
    remote: true
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.13
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.41.0
    remote: false

The text was updated successfully, but these errors were encountered:

cole-miller · 2022-08-22T18:58:37Z

The AIO requirement is documented in README.md in this repo -- I can add a note in the dqlite, go-dqlite, and lxd repos as well.

The specific error here is a null pointer dereference because ~~something in the call stack of dqlite_node_create doesn't set the errmsg field~~ we try to use a nonexistent struct dqlite_node. I'll fix that, and then the feedback when running without AIO support should be better.

cole-miller · 2022-08-24T16:07:55Z

Error message should now be something like:

Error: create node: failed to create node: internal dqlite error

Not the greatest, but better than a panic. Maybe it would be a good idea to modify dqlite_node_create to allow returning an error string (using some other channel than the errmsg field of struct dqlite_node)? Then we wouldn't have to interpret the rather non-specific return code in go-dqlite.

freeekanayaka · 2022-08-24T19:04:15Z

Error message should now be something like:
Error: create node: failed to create node: internal dqlite error
Not the greatest, but better than a panic. Maybe it would be a good idea to modify dqlite_node_create to allow returning an error string (using some other channel than the errmsg field of struct dqlite_node)? Then we wouldn't have to interpret the rather non-specific return code in go-dqlite.

What do you think of changing the behavior of dqlite_node_create() so that even if it returns an error, it does not free the dqlite_node **n pointer, so call sites can invoke dqlite_node_errmsg() to extract the error message, and then call dqlite_node_destroy() to free the memory.

The only exception to the above behavior would perhaps be if dqlite_node_create fails with DQLITE_NOMEM, but it's a minor detail practically speaking, probably something we don't need to worry too much in real world.

cole-miller · 2022-08-24T19:35:46Z

That seems reasonable to me. I assume the current behavior (clearing the struct dqlite_node) is defense in depth against callers who neglect to check for errors -- maybe we could address that concern in some other way.

freeekanayaka · 2022-08-25T07:39:53Z

That seems reasonable to me. I assume the current behavior (clearing the struct dqlite_node) is defense in depth against callers who neglect to check for errors -- maybe we could address that concern in some other way.

I don't think there's a particular reason for the current behavior. As long as we document that dqlite_node_destroy(), must always be called, even in case of errors, we should be good and don't need to do anything special.

The idiomatic way would be something like:

int rc;
dqlite* node;

rc = dqlite_node_create(..., &node);
if (rc != 0) {
    if (rc == DQLITE_NOMEM) {
        printf("out of memory\n");
   } else {
        printf("%s\n", dqlite_node_errmsg(node));
        dqlite_node_destroy(node)
   }
   exit(-1); /* or whatever */
}

cole-miller · 2022-08-31T17:47:17Z

Now that the AIO requirement is documented everywhere and the go-dqlite error message is fixed (see canonical/go-dqlite#199) I think this can be closed?

cole-miller · 2022-10-11T07:55:19Z

That seems reasonable to me. I assume the current behavior (clearing the `struct dqlite_node`) is defense in depth against callers who neglect to check for errors -- maybe we could address that concern in some other way.

…

On Wed, Aug 24, 2022 at 3:04 PM Free Ekanayaka ***@***.***> wrote: Error message should now be something like: Error: create node: failed to create node: internal dqlite error Not the greatest, but better than a panic. Maybe it would be a good idea to modify dqlite_node_create to allow returning an error string (using some other channel than the errmsg field of struct dqlite_node)? Then we wouldn't have to interpret the rather non-specific return code in go-dqlite. What do you think of changing the behavior of dqlite_node_create() so that even if it returns an error, it does *not* free the dqlite_node **n pointer, so call sites can invoke dqlite_node_errmsg() to extract the error message, and then call dqlite_node_destroy() to free the memory. The only exception to the above behavior would perhaps be if dqlite_node_create fails with DQLITE_NOMEM, but it's a minor detail practically speaking, probably something we don't need to worry too much in real world. — Reply to this email directly, view it on GitHub <#228 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMYX2CWHN2JM3YBN2VMBIQDV2ZW3VANCNFSM5DZS2TNA> . You are receiving this because you commented.Message ID: ***@***.***>

stgraber mentioned this issue Sep 10, 2021

LXD refuses to start if kernel does not support AIO canonical/lxd#9189

Closed

MathieuBordere self-assigned this Oct 5, 2021

This was referenced Aug 22, 2022

internal/bindings/server: don't use bad dqlite_node canonical/go-dqlite#196

Merged

README: note AIO requirement canonical/dqlite#378

Merged

doc: Note AIO requirement canonical/lxd#10832

Merged

cole-miller mentioned this issue Aug 25, 2022

Expose string errors from dqlite_node_create canonical/dqlite#381

Merged

stgraber closed this as completed Sep 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LXD refuses to start if kernel does not support AIO #228

LXD refuses to start if kernel does not support AIO #228

stgraber commented Sep 10, 2021

cole-miller commented Aug 22, 2022 •

edited

Loading

cole-miller commented Aug 24, 2022

freeekanayaka commented Aug 24, 2022

cole-miller commented Aug 24, 2022

freeekanayaka commented Aug 25, 2022

cole-miller commented Aug 31, 2022

cole-miller commented Oct 11, 2022 via email

LXD refuses to start if kernel does not support AIO #228

LXD refuses to start if kernel does not support AIO #228

Comments

stgraber commented Sep 10, 2021

Required information

cole-miller commented Aug 22, 2022 • edited Loading

cole-miller commented Aug 24, 2022

freeekanayaka commented Aug 24, 2022

cole-miller commented Aug 24, 2022

freeekanayaka commented Aug 25, 2022

cole-miller commented Aug 31, 2022

cole-miller commented Oct 11, 2022 via email

cole-miller commented Aug 22, 2022 •

edited

Loading