Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

LXD refuses to start if kernel does not support AIO #228

Closed
stgraber opened this issue Sep 10, 2021 · 7 comments
Closed

LXD refuses to start if kernel does not support AIO #228

stgraber opened this issue Sep 10, 2021 · 7 comments
Assignees

Comments

@stgraber
Copy link
Contributor

Originally reported at https://github.com/lxc/lxd/issues/9189


I had to compile my own kernel for various reasons, and while I was doing so I figured to harden it a little. Among other things, I decided to disable AIO because it's typically defined as legacy and has had many security issues in the past (besides, io_uring is the new kid on the block).

However, once running on this kernel, 'lxd init --preseed' no longer works because lxd won't start. In the logs the following stack trace is found:

Sep 02 18:46:52 hpv1 lxd.daemon[122708]: panic: runtime error: invalid memory address or nil pointer dereference
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xd08 pc=0x40d56e]
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: goroutine 1 [running]:
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite/internal/bindings._Cfunc_GoString(...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         _cgo_gotypes.go:102
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite/internal/bindings.NewNode(0x1, 0x183fb84, 0x1, 0xc0001a1050, 0x28, 0x0, 0x0, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/canonical/go-dqlite/internal/bindings/server.go:127 +0x136
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/canonical/go-dqlite.New(0x1, 0x183fb84, 0x1, 0xc0001a1050, 0x28, 0xc000a170c0, 0x1, 0x1, 0x8, 0x203000, ...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/canonical/go-dqlite/node.go:70 +0xc5
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/lxc/lxd/lxd/cluster.(*Gateway).init(0xc0002ea2a0, 0xc000c29100, 0xc000c70b40, 0xc000458640)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/cluster/gateway.go:811 +0x471
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/lxc/lxd/lxd/cluster.NewGateway(0xc000316090, 0xc0002e4000, 0xc0002cac30, 0xc000a175b0, 0x2, 0x2, 0xc00019d350, 0x2, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/cluster/gateway.go:66 +0x1d7
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*Daemon).init(0xc00014e680, 0xc0001af1b8, 0xc00014e680)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/daemon.go:962 +0x12ce
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*Daemon).Init(0xc00014e680, 0xc0003ec120, 0xc00014e680)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/daemon.go:707 +0x2f
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.(*cmdDaemon).Run(0xc0001af110, 0xc000035400, 0xc0002cc9c0, 0x0, 0x4, 0x0, 0x0)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/main_daemon.go:67 +0x36f
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).execute(0xc000035400, 0xc0001c4010, 0x4, 0x4, 0xc000035400, 0xc0001c4010)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:856 +0x472
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000035400, 0xc00049df58, 0x1, 0x1)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:974 +0x375
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: github.com/spf13/cobra.(*Command).Execute(...)
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/spf13/cobra/command.go:902
Sep 02 18:46:52 hpv1 lxd.daemon[122708]: main.main()
Sep 02 18:46:52 hpv1 lxd.daemon[122708]:         /build/lxd/parts/lxd/src/.go/src/github.com/lxc/lxd/lxd/main.go:218 +0x1af7
Sep 02 18:46:53 hpv1 lxd.daemon[122568]: => LXD failed to start

If it is intended for AIO to be required, my suggestion would be to document this (the only reference to AIO I could find is a recommendation for aio-max-nr sysctl) and maybe add some error handling so LXD doesn't just panic.

Required information

  • Distribution: Ubuntu 20.04.1 LTS
  • The output of "lxc info":
# lxc info
config:
  cluster.https_address: 'snip'
  core.https_address: 'snip'
  core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
environment:
  addresses:
  - 'snip'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
snip
    -----END CERTIFICATE-----
  certificate_fingerprint: 1420snip062
  driver: qemu | lxc
  driver_version: 6.1.0 | 4.0.10
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.4.133-formicidae20210902165706
  lxc_features:
    cgroup2: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "20.04"
  project: default
  server: lxd
  server_clustered: true
  server_name: yes
  server_pid: 12644
  server_version: "4.17"
  storage: zfs
  storage_version: 2.1.0-1
  storage_supported_drivers:
  - name: zfs
    version: 2.1.0-1
    remote: false
  - name: ceph
    version: 15.2.13
    remote: true
  - name: btrfs
    version: 5.4.1
    remote: false
  - name: cephfs
    version: 15.2.13
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.07(2) (2019-11-30) / 1.02.167 (2019-11-30) / 4.41.0
    remote: false
@cole-miller
Copy link
Contributor

cole-miller commented Aug 22, 2022

The AIO requirement is documented in README.md in this repo -- I can add a note in the dqlite, go-dqlite, and lxd repos as well.

The specific error here is a null pointer dereference because something in the call stack of dqlite_node_create doesn't set the errmsg field we try to use a nonexistent struct dqlite_node. I'll fix that, and then the feedback when running without AIO support should be better.

@cole-miller
Copy link
Contributor

Error message should now be something like:

Error: create node: failed to create node: internal dqlite error

Not the greatest, but better than a panic. Maybe it would be a good idea to modify dqlite_node_create to allow returning an error string (using some other channel than the errmsg field of struct dqlite_node)? Then we wouldn't have to interpret the rather non-specific return code in go-dqlite.

@freeekanayaka
Copy link
Contributor

Error message should now be something like:

Error: create node: failed to create node: internal dqlite error

Not the greatest, but better than a panic. Maybe it would be a good idea to modify dqlite_node_create to allow returning an error string (using some other channel than the errmsg field of struct dqlite_node)? Then we wouldn't have to interpret the rather non-specific return code in go-dqlite.

What do you think of changing the behavior of dqlite_node_create() so that even if it returns an error, it does not free the dqlite_node **n pointer, so call sites can invoke dqlite_node_errmsg() to extract the error message, and then call dqlite_node_destroy() to free the memory.

The only exception to the above behavior would perhaps be if dqlite_node_create fails with DQLITE_NOMEM, but it's a minor detail practically speaking, probably something we don't need to worry too much in real world.

@cole-miller
Copy link
Contributor

That seems reasonable to me. I assume the current behavior (clearing the struct dqlite_node) is defense in depth against callers who neglect to check for errors -- maybe we could address that concern in some other way.

@freeekanayaka
Copy link
Contributor

That seems reasonable to me. I assume the current behavior (clearing the struct dqlite_node) is defense in depth against callers who neglect to check for errors -- maybe we could address that concern in some other way.

I don't think there's a particular reason for the current behavior. As long as we document that dqlite_node_destroy(), must always be called, even in case of errors, we should be good and don't need to do anything special.

The idiomatic way would be something like:

int rc;
dqlite* node;

rc = dqlite_node_create(..., &node);
if (rc != 0) {
    if (rc == DQLITE_NOMEM) {
        printf("out of memory\n");
   } else {
        printf("%s\n", dqlite_node_errmsg(node));
        dqlite_node_destroy(node)
   }
   exit(-1); /* or whatever */
}

@cole-miller
Copy link
Contributor

Now that the AIO requirement is documented everywhere and the go-dqlite error message is fixed (see canonical/go-dqlite#199) I think this can be closed?

@stgraber stgraber closed this as completed Sep 1, 2022
@cole-miller
Copy link
Contributor

cole-miller commented Oct 11, 2022 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants