Syncthing startup hang #7392

fosslinux · 2022-03-30T12:17:43Z

Qubes OS release

4.1

Brief summary

After my most recent set of updates, syncthing does not start in any of my VMs. This is not unique to any one distribution. I can reproduce on Fedora, Debian, and even my unupdated Void Linux StandaloneVM. It previously worked fine (regression).

I still have a system that I have not updated and syncthing works perfectly on. I will attempt to track down the culprit update.

Debugging using delve seems to track down the hang to a CPUID call in https://github.com/klauspost/cpuid/blob/master/cpuid_amd64.s#L9.

I have tried:

changing the kernel version

I am yet to try:

downgrading any packages from dom0
just calling CPUID to see if CPUID is borked

Steps to reproduce

Fully update a Qubes OS system.
Install syncthing in any AppVM.
Attempt to run syncthing and note the hang.

Expected behavior

Syncthing starts.

Actual behavior

Syncthing does not start.

DemiMarie · 2022-03-30T19:33:21Z

Can you get a backtrace from the hang? Does the hang appear to be at the CPUID instruction itself (as if the instruction never finishes)? Can the hung process be killed? Does a simple program that calls CPUID manually work?

CPUID is intercepted by Xen and emulated in the hypervisor, so this could be a Xen bug.

fosslinux · 2022-04-03T04:36:34Z

Here is the full backtrace and code around the hang using delve (go debugger):

$ dlv exec ./syncthing 
Type 'help' for list of commands.
(dlv) r
Process restarted with PID 16155
(dlv) c
received SIGINT, stopping process (will not forward signal)
> github.com/klauspost/cpuid/v2.asmCpuidex() /home/user/go/pkg/mod/github.com/klauspost/cpuid/v2@v2.0.9/cpuid_amd64.s:21 (PC: 0x85cfea)
    16:	// func asmCpuidex(op, op2 uint32) (eax, ebx, ecx, edx uint32)
    17:	TEXT ·asmCpuidex(SB), 7, $0
    18:		MOVL op+0(FP), AX
    19:		MOVL op2+4(FP), CX
    20:		CPUID
=>  21:		MOVL AX, eax+8(FP)
    22:		MOVL BX, ebx+12(FP)
    23:		MOVL CX, ecx+16(FP)
    24:		MOVL DX, edx+20(FP)
    25:		RET
    26:	
(dlv) bt
 0  0x000000000085cfea in github.com/klauspost/cpuid/v2.asmCpuidex
    at /home/user/go/pkg/mod/github.com/klauspost/cpuid/v2@v2.0.9/cpuid_amd64.s:21
 1  0x000000000085cdaa in github.com/klauspost/cpuid/v2.asmCpuidex
    at :0
 2  0x0000000000858525 in github.com/klauspost/cpuid/v2.(*CPUInfo).cacheSize
    at /home/user/go/pkg/mod/github.com/klauspost/cpuid/v2@v2.0.9/cpuid.go:765
 3  0x000000000085c396 in github.com/klauspost/cpuid/v2.addInfo
    at /home/user/go/pkg/mod/github.com/klauspost/cpuid/v2@v2.0.9/detect_x86.go:33
 4  0x0000000000856505 in github.com/klauspost/cpuid/v2.Detect
    at /home/user/go/pkg/mod/github.com/klauspost/cpuid/v2@v2.0.9/cpuid.go:257
 5  0x000000000085643c in github.com/klauspost/cpuid/v2.init.0
    at /home/user/go/pkg/mod/github.com/klauspost/cpuid/v2@v2.0.9/cpuid.go:236
 6  0x00000000004489c7 in runtime.doInit
    at /usr/lib/go/src/runtime/proc.go:6498
 7  0x0000000000448919 in runtime.doInit
    at /usr/lib/go/src/runtime/proc.go:6475
 8  0x0000000000448919 in runtime.doInit
    at /usr/lib/go/src/runtime/proc.go:6475
 9  0x0000000000448919 in runtime.doInit
    at /usr/lib/go/src/runtime/proc.go:6475
10  0x000000000043b754 in runtime.main
    at /usr/lib/go/src/runtime/proc.go:238
11  0x000000000046d041 in runtime.goexit
    at /usr/lib/go/src/runtime/asm_amd64.s:1581
(dlv)

I believe that it appears to be on line 21 as the CPUID instruction is executed, incrementing the line counter, but it never finishes, causing the hang. At least that is how it seems.

Yes, the hung process can be killed.

However, a simple assembly program calling CPUID does not hang, which is pretty strange.
The assembly provided as a more indepth example on the Wikipedia page for CPUID also returns correct values...

DemiMarie · 2022-04-03T06:10:39Z

Yes, the hung process can be killed.

That seems to indicate that whatever is going on is not the process being stuck in Xen, since that wouldn’t be killable.

However, a simple assembly program calling CPUID does not hang, which is pretty strange.
The assembly provided as a more indepth example on the Wikipedia page for CPUID also returns correct values...

Can you try calling manually CPUID with the exact same parameters as the failing call? This seems like a strange low-level bug in either Xen or Linux. Would you by any chance be able to see what happens in kernel mode when the CPUID instruction is issued?

fosslinux · 2022-04-05T10:41:04Z

Further investigation has revealed (not sure how I didn't pick up some of this earlier);

At this loop, this loop never ends, creating the illusion that CPUID never ends (for somereason whenever I CTRL-Cd it always stopped around CPUID, probably due to those instructions being most intensive).

On my AMD Ryzen 1600AF (aka basically 2600), the cpuidex call (CPUID extended, includes an extra argument), no matter the value of the second argument (which is what is being incremented), it returns the exact same values. I think this is incorrect but I cannot find data to back that up either way.

On my Intel i5-7200U laptop, after complete system upgrades, I cannot reproduce this issue. Seems to be limited to AMD systems? (I don't have more AMD hardware to test on).

In my second Qubes install for testing things (on a separate drive), I can reproduce this problem, but NOT if I run the syncthing binary in dom0 (any casual readers, please do not do this, the security implications are possible complete system compromise).

When I next have some spare time to investigate this, I will see whether the same call of CPUID in dom0 and a qube produces the same output. (Hypothesis: It will not).

marmarek · 2022-04-05T11:04:52Z

On my AMD Ryzen 1600AF (aka basically 2600), the cpuidex call (CPUID extended, includes an extra argument), no matter the value of the second argument (which is what is being incremented), it returns the exact same values.

Can you say what specifically values are returned there?

fosslinux · 2022-04-05T11:43:00Z

Can you say what specifically values are returned there?

Values returned:
	eax: 16673
	ebx: 29360191
	ecx: 63
	edx: 0

andyhhp · 2022-04-05T14:46:54Z

This is a bug in Xen with how we derive CPUID data for guests. Leaf 0x8000001d shouldn't be exposed because it's not yet tied into the topology logic (the number of CPUs sharing this cache needs adjusting based on the number of vCPUs the VM has.)

fosslinux · 2022-04-05T21:27:22Z

Ah, makes sense, but how come this worked in the past then?

andyhhp · 2022-04-06T09:55:01Z

It will be a consequence of https://github.com/QubesOS/qubes-vmm-xen/blob/xen-4.14/patch-stable-0002-x86-cpuid-support-LFENCE-always-serialising-CPUID-bi.patch being backported for speculation reasons. Before that patch, 0x8000001c was reported as the max extended leaf, so userspace wouldn't have gone looking at 0x8000001d.

andyhhp · 2022-04-06T10:54:36Z

Actually, there is also a bug in whatever piece of userspace is looping here. The AMD manual states that the contents of 0x8000001d and 0x8000001e are only valid if the TOPOEXT feature (CPUID.80000001.edx[22]) is visible. Xen hides this feature, so userspace shouldn't go looking.

DemiMarie · 2022-04-06T18:02:21Z

Actually, there is also a bug in whatever piece of userspace is looping here. The AMD manual states that the contents of 0x8000001d and 0x8000001e are only valid if the TOPOEXT feature (CPUID.80000001.edx[22]) is visible. Xen hides this feature, so userspace shouldn't go looking.

Time to be mean and inject #GP into the buggy guest userspace? (Only half-joking.)

andyhhp · 2022-04-06T19:28:11Z

@fosslinux Any chance I can have a name & email address for a Reported-by tag on the upstream fix?

fosslinux · 2022-04-06T21:19:35Z

I don't use my real name (at least at this time). So if fosslinux <fosslinux@aussies.space> is OK, please use that.

c/s 1a91425 increased the AMD max leaf from 0x8000001c to 0x80000021, but did not adjust anything in the calculate_*_policy() chain. As a result, on hardware supporting these leaves, we read the real hardware values into the raw policy, then copy into host, and all the way into the PV/HVM default policies. All 4 of these leaves have enable bits (first two by TopoExt, next by SEV, next by PQOS), so any software following the rules is fine and will leave them alone. However, leaf 0x8000001d takes a subleaf input and at least two userspace utilities have been observed to loop indefinitely under Xen (clearly waiting for eax to report "no more cache levels"). Such userspace is buggy, but Xen's behaviour isn't great either. In the short term, clobber all information in these leaves. This is a giant bodge, but there are complexities with implementing all of these leaves properly. Fixes: 1a91425 ("x86/cpuid: support LFENCE always serialising CPUID bit") Link: QubesOS/qubes-issues#7392 Reported-by: fosslinux <fosslinux@aussies.space> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>

emacsomancer · 2022-04-11T00:46:58Z

Is there any current workaround for this issue?

DemiMarie · 2022-04-11T02:59:12Z

@marmarek time to pop out a new Xen package?

c/s 1a91425 increased the AMD max leaf from 0x8000001c to 0x80000021, but did not adjust anything in the calculate_*_policy() chain. As a result, on hardware supporting these leaves, we read the real hardware values into the raw policy, then copy into host, and all the way into the PV/HVM default policies. All 4 of these leaves have enable bits (first two by TopoExt, next by SEV, next by PQOS), so any software following the rules is fine and will leave them alone. However, leaf 0x8000001d takes a subleaf input and at least two userspace utilities have been observed to loop indefinitely under Xen (clearly waiting for eax to report "no more cache levels"). Such userspace is buggy, but Xen's behaviour isn't great either. In the short term, clobber all information in these leaves. This is a giant bodge, but there are complexities with implementing all of these leaves properly. Fixes: 1a91425 ("x86/cpuid: support LFENCE always serialising CPUID bit") Link: QubesOS/qubes-issues#7392 Reported-by: fosslinux <fosslinux@aussies.space> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> master commit: d4012d5 master date: 2022-04-07 11:36:45 +0100

qubesos-bot · 2022-05-04T13:31:02Z

Automated announcement from builder-github

The package vmm-xen has been pushed to the r4.1 testing repository for the CentOS centos-stream8 template.
To test this update, please install it with the following command:

sudo yum update --enablerepo=qubes-vm-r4.1-current-testing

Changes included in this update

qubesos-bot · 2022-05-04T13:42:54Z

Automated announcement from builder-github

The component vmm-xen (including package python3-xen-4.14.5-1.fc32) has been pushed to the r4.1 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

qubesos-bot · 2022-05-04T13:47:53Z

Automated announcement from builder-github

The package xen_4.14.5-1 has been pushed to the r4.1 testing repository for the Debian template.
To test this update, first enable the testing repository in /etc/apt/sources.list.d/qubes-*.list by uncommenting the line containing buster-testing (or appropriate equivalent for your template version), then use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot · 2022-06-27T10:18:03Z

Automated announcement from builder-github

The package vmm-xen has been pushed to the r4.1 stable repository for the CentOS centos-stream8 template.
To install this update, please use the standard update command:

sudo yum update

Changes included in this update

qubesos-bot · 2022-06-27T10:18:43Z

Automated announcement from builder-github

The package xen_4.14.5-4+deb10u1 has been pushed to the r4.1 stable repository for the Debian template.
To install this update, please use the standard update command:

sudo apt-get update && sudo apt-get dist-upgrade

Changes included in this update

qubesos-bot · 2022-06-27T10:32:04Z

Automated announcement from builder-github

The component vmm-xen (including package python3-xen-4.14.5-4.fc32) has been pushed to the r4.1 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

fosslinux added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Mar 30, 2022

andrewdavidwong added C: other needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Mar 30, 2022

andrewdavidwong added this to the Release 4.1 updates milestone Mar 30, 2022

andrewdavidwong added C: Xen and removed C: other labels Apr 6, 2022

andrewdavidwong added the hardware support label Apr 6, 2022

marmarek mentioned this issue Apr 15, 2022

Update to 4.14.5 QubesOS/qubes-vmm-xen#130

Merged

marmarek closed this as completed in QubesOS/qubes-vmm-xen@efb157f May 4, 2022

marmarek closed this as completed in QubesOS/qubes-vmm-xen#130 May 4, 2022

qubesos-secondary-bot mentioned this issue May 4, 2022

vmm-xen v4.14.5-1 (r4.1) QubesOS/updates-status#2927

Closed

qubesos-bot added the r4.1-centos-stream8-cur-test label May 4, 2022

qubesos-bot added r4.1-dom0-cur-test r4.1-bookworm-cur-test r4.1-bullseye-cur-test labels May 4, 2022

qubesos-bot added the r4.1-buster-cur-test label May 4, 2022

andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels May 4, 2022

qubesos-secondary-bot mentioned this issue Jun 9, 2022

vmm-xen v4.14.5-2 (r4.1) QubesOS/updates-status#2967

Closed

This was referenced Jun 16, 2022

vmm-xen v4.14.5-3 (r4.1) QubesOS/updates-status#2975

Closed

vmm-xen v4.14.5-4 (r4.1) QubesOS/updates-status#2981

Closed

qubesos-bot added r4.1-centos-stream8-stable r4.1-bookworm-stable r4.1-bullseye-stable and removed r4.1-centos-stream8-cur-test r4.1-bookworm-cur-test r4.1-bullseye-cur-test labels Jun 27, 2022

qubesos-bot added r4.1-buster-stable and removed r4.1-buster-cur-test labels Jun 27, 2022

qubesos-bot added r4.1-dom0-stable and removed r4.1-dom0-cur-test labels Jun 27, 2022

andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syncthing startup hang #7392

Syncthing startup hang #7392

fosslinux commented Mar 30, 2022 •

edited

DemiMarie commented Mar 30, 2022

fosslinux commented Apr 3, 2022

DemiMarie commented Apr 3, 2022

fosslinux commented Apr 5, 2022 •

edited

marmarek commented Apr 5, 2022

fosslinux commented Apr 5, 2022

andyhhp commented Apr 5, 2022

fosslinux commented Apr 5, 2022

andyhhp commented Apr 6, 2022

andyhhp commented Apr 6, 2022

DemiMarie commented Apr 6, 2022

andyhhp commented Apr 6, 2022

fosslinux commented Apr 6, 2022

emacsomancer commented Apr 11, 2022

DemiMarie commented Apr 11, 2022

qubesos-bot commented May 4, 2022

qubesos-bot commented May 4, 2022

qubesos-bot commented May 4, 2022

qubesos-bot commented Jun 27, 2022

qubesos-bot commented Jun 27, 2022

qubesos-bot commented Jun 27, 2022

Syncthing startup hang #7392

Syncthing startup hang #7392

Comments

fosslinux commented Mar 30, 2022 • edited

Qubes OS release

Brief summary

Steps to reproduce

Expected behavior

Actual behavior

DemiMarie commented Mar 30, 2022

fosslinux commented Apr 3, 2022

DemiMarie commented Apr 3, 2022

fosslinux commented Apr 5, 2022 • edited

marmarek commented Apr 5, 2022

fosslinux commented Apr 5, 2022

andyhhp commented Apr 5, 2022

fosslinux commented Apr 5, 2022

andyhhp commented Apr 6, 2022

andyhhp commented Apr 6, 2022

DemiMarie commented Apr 6, 2022

andyhhp commented Apr 6, 2022

fosslinux commented Apr 6, 2022

emacsomancer commented Apr 11, 2022

DemiMarie commented Apr 11, 2022

qubesos-bot commented May 4, 2022

qubesos-bot commented May 4, 2022

qubesos-bot commented May 4, 2022

qubesos-bot commented Jun 27, 2022

qubesos-bot commented Jun 27, 2022

qubesos-bot commented Jun 27, 2022

fosslinux commented Mar 30, 2022 •

edited

fosslinux commented Apr 5, 2022 •

edited