Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] keydb on Debian 12 failed after upgrade from 6.3.3 to 6.3.4 #774

Open
holkmann opened this issue Jan 9, 2024 · 11 comments
Open

[BUG] keydb on Debian 12 failed after upgrade from 6.3.3 to 6.3.4 #774

holkmann opened this issue Jan 9, 2024 · 11 comments

Comments

@holkmann
Copy link

holkmann commented Jan 9, 2024

Describe the bug

Debian 12 (Proxmox VM / Proxmox 8.1.3), freshly installed, only for KeyDB. KeyDB 6.3.3 works without errors. As soon as I update to 6.3.4, KeyDB or KeyDB server no longer works. If I switch back to version 6.3.3, KeyDB works again. I was able to reproduce with different Debian 12 VMs.

To reproduce

Notice: I previously had version 6.3.3 on hold via apt-mark and therefore only shows the upgrade process, which is sufficient for reproducing.

root@keydb:~# apt upgrade
Reading package lists… Done
Building dependency tree… Done
Status information is being read in… Done
Package update (upgrade) is being calculated... Done
The following packages will be updated (upgraded):
   keydb keydb server keydb tools
3 updated, 0 reinstalled, 0 removed and 0 not updated.
13.0 MB of archives must be downloaded.
After this operation, an additional 12.5 MB of disk space will be used.
Do you want to continue? [Y/n] j
Get:1 https://download.keydb.dev/open-source-dist bookworm/main amd64 keydb all 6:6.3.4-1+deb12u1 [21.0 kB]
Get:2 https://download.keydb.dev/open-source-dist bookworm/main amd64 keydb-server amd64 6:6.3.4-1+deb12u1 [59.8 kB]
Get:3 https://download.keydb.dev/open-source-dist bookworm/main amd64 keydb-tools amd64 6:6.3.4-1+deb12u1 [13.0 MB]
13.0 MB were fetched in 2 s (6,271 kB/s).
Reading changelogs... Done
(Reading database...47946 files and directories are currently installed.)
Preparing to unpack .../keydb_6%3a6.3.4-1+deb12u1_all.deb ...
Unpacking keydb (6:6.3.4-1+deb12u1) via (6:6.3.3-1+deb12u1) ...
Preparing to unpack .../keydb-server_6%3a6.3.4-1+deb12u1_amd64.deb ...
Unpacking keydb-server (6:6.3.4-1+deb12u1) via (6:6.3.3-1+deb12u1) ...
Preparing to unpack .../keydb-tools_6%3a6.3.4-1+deb12u1_amd64.deb ...
Unpacking keydb-tools (6:6.3.4-1+deb12u1) via (6:6.3.3-1+deb12u1) ...
Setting up keydb-tools (6:6.3.4-1+deb12u1)...
Setting up keydb-server (6:6.3.4-1+deb12u1)...
Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 145.
Setting up keydb (6:6.3.4-1+deb12u1)...
Processing triggers for man-db (2.11.2-2)...

Why this? => "Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 145."

root@keydb:~# dpkg -l|grep keydb
ii keydb 6:6.3.4-1+deb12u1 all Persistent key-value database with network interface (metapackage)
ii keydb-server 6:6.3.4-1+deb12u1 amd64 Persistent key-value database with network interface
ii keydb-tools 6:6.3.4-1+deb12u1 amd64 Persistent key-value database with network interface (client)
root@keydb:~# systemctl status keydb server
× keydb-server.service - Advanced key-value store
      Loaded: loaded (/lib/systemd/system/keydb-server.service; enabled; preset: enabled)
      Active: failed (Result: signal) since Tue 2024-01-09 07:38:57 CET; 28s ago
    Duration: 5min 42,385s
        Docs: https://docs.keydb.dev,
              man:keydb-server(1)
     Process: 8261 ExecStart=/usr/bin/keydb-server /etc/keydb/keydb.conf (code=killed, signal=ILL)
         CPU: 24ms

Jan 09 07:38:57 keydb systemd[1]: keydb-server.service: Scheduled restart job, restart counter is at 6.
Jan 09 07:38:57 keydb systemd[1]: Stopped keydb-server.service - Advanced key-value store.
Jan 09 07:38:57 keydb systemd[1]: keydb-server.service: Start request repeated too quickly.
Jan 09 07:38:57 keydb systemd[1]: keydb-server.service: Failed with result 'signal'.
Jan 09 07:38:57 keydb systemd[1]: Failed to start keydb-server.service - Advanced key-value store.

The keydb-server.log only shows this:

499:487:M 09 Jan 2024 07:33:14.233 # Server initialized
499:487:M 09 Jan 2024 07:33:14.234 * Loading RDB produced by version 6.3.3
499:487:M 09 Jan 2024 07:33:14.234 * RDB age 1108229 seconds
499:487:M 09 Jan 2024 07:33:14.234 * RDB memory usage when created 2.02 Mb
499:487:M 09 Jan 2024 07:33:14.234 # Done loading RDB, keys loaded: 0, keys expired: 0.
499:487:M 09 Jan 2024 07:33:14.234 * DB loaded from disk: 0.001 seconds
499:556:M 09 Jan 2024 07:33:14.234 * Thread 0 alive.
499:557:M 09 Jan 2024 07:33:14.234 * Thread 1 alive.
499:signal-handler (1704782335) Received SIGTERM scheduling shutdown...
499:signal-handler (1704782335) Received SIGTERM scheduling shutdown...
499:556:M 09 Jan 2024 07:38:55.984 # User requested shutdown...
499:556:M 09 Jan 2024 07:38:55.984 * Saving the final RDB snapshot before exiting.
499:556:M 09 Jan 2024 07:38:55.999 * DB saved on disk
499:556:M 09 Jan 2024 07:38:56.000 * Removing the pid file.
499:556:M 09 Jan 2024 07:38:56.000 # KeyDB is now ready to exit, bye bye...

Expected behavior

I would have expected that KeyDB or KeyDB server would continue to work after an upgrade. :-)
Have I perhaps overlooked something or made a mistake somewhere?

Additional information

Proxmox VM Infos:
OS: Debian GNU/Linux 12 (bookworm) x86_64
Host: KVM/QEMU (Standard PC (i440FX + PIIX, 1996) pc-i440fx-8.1)
Kernel: 6.1.0-15-amd64

Thanks for help! If you need more informations, just let me know.

@everii-mapi
Copy link

I am seeing the same issue,

# keydb-server
Illegal instruction

Downgrading to 6:6.3.3-1+deb12u1 fixed the problem.

viceice added a commit to visualon/docker-images that referenced this issue Jan 16, 2024
@viceice
Copy link

viceice commented Jan 16, 2024

Seeing the same on the official docker image 😕

@petermade
Copy link

Seeing the same. Could it be due to older hardware? In production (Ubuntu 22.04.3 LTS) its working, however on our accept environment running the exact same software, but virtualised through proxmox I have this problem as well.

@frankfil
Copy link

One of my client's runs Hyper-V and I'm seeing the same issue with a Rocky Linux 8 VM running the Docker version.

Interestingly that VM has Processor Compatibility Mode (PCM) enabled. If I reboot the VM with PCM disabled 6.3.4 now runs.

Unfortunately that VM requires PCM enabled so we are stuck on 6.3.3 for the time being.

@frankfil
Copy link

Had an opportunity to test Running the Docker version of 6.3.4 on a Rocky Linux 9 VM under Proxmox 8.1.3 and it does work for me with the CPU type for the VM set to Haswell-noTSX-IBRS (running on a host with a better CPU than that, another host in the datacenter has an older CPU hence that setting).

@skid9000
Copy link

Same problem here on a debian VM with the default x86-64-v2-AES cpu type on a Intel Xeon E5-2630Lv2 CPU

6.3.4 fails with illegal hardware instruction but going back to 6.3.3 works fine.

lscpu from the VM

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        QEMU
  Model name:            QEMU Virtual CPU version 2.5+
    BIOS Model name:     pc-i440fx-8.1  CPU @ 2.0GHz
    BIOS CPU family:     1
    CPU family:          15
    Model:               107
    Thread(s) per core:  1
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            1
    BogoMIPS:            4799,99
    Flags:               fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cl
                         flush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology c
                         puid tsc_known_freq pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hyp
                         ervisor lahf_lm cpuid_fault pti
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    32 MiB (8 instances)
  L3:                    16 MiB (1 instance)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-7
  NUMA node1 CPU(s):     
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Mitigation: VMX unsupported
  L1tf:                  Mitigation; PTE Inversion
  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state 
                         unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not a
                         ffected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

lscpu from the Host

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz
    CPU family:          6
    Model:               62
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           2
    Stepping:            4
    CPU(s) scaling MHz:  46%
    CPU max MHz:         2800.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            4800.14
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts re
                         p_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xs
                         ave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   384 KiB (12 instances)
  L1i:                   384 KiB (12 instances)
  L2:                    3 MiB (12 instances)
  L3:                    30 MiB (2 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Mitigation: Split huge pages
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

@holkmann
Copy link
Author

holkmann commented Feb 7, 2024

Addendum: Change the "cpu type" from "x86-64-v2-AES" (Proxmox 8 Standard) to "Host" resolves this problem. (VM Hardware Tab => Processors)

@skid9000
Copy link

I mean, it's a workaround, not really a solution. (host isn't recommended for a cluster with nodes with different cpus as far as i know)

@holkmann
Copy link
Author

I mean, it's a workaround, not really a solution. (host isn't recommended for a cluster with nodes with different cpus as far as i know)

Of course you're right, but we'll have to wait until one of the developers has a solution for this.

@maikirakiwi
Copy link

Still present as of today

@holkmann
Copy link
Author

Still present as of today

True, but I think it has to do with this statement:
#798 (comment)

We will probably have to wait a while for a bug fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants