Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Panic with 9.1.15 #79

Open
qiyuanzhi opened this issue Nov 17, 2023 · 3 comments
Open

Kernel Panic with 9.1.15 #79

qiyuanzhi opened this issue Nov 17, 2023 · 3 comments

Comments

@qiyuanzhi
Copy link

Hello!

I got a kernel panic with 9.1.15

it is a 3 nodes cluster and two nodes got this Panic at the same time.

Here is Call trace:

[16259.065253] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d node-3: Preparing remote state change 1878057944
[16259.067843] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d node-3: Committing remote state change 1878057944 (primary_nodes=0)
[16259.067859] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d/0 drbd1020 node-3: pdsk( UpToDate -> Detaching )
[16259.069646] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d/0 drbd1020 node-3: pdsk( Detaching -> Diskless )
[16259.076779] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d: Preparing cluster-wide state change 277524540 (1->-1 7680/1024)
[16259.077210] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d: State change 277524540: primary_nodes=0, weak_nodes=0
[16259.077214] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d: Committing cluster-wide state change 277524540 (1ms)
[16259.077251] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d/0 drbd1020: disk( UpToDate -> Detaching )
[16259.077489] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d/0 drbd1020: disk( Detaching -> Diskless )
[16259.077951] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d/0 drbd1020: drbd_bm_resize called with capacity == 0
[16259.181762] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
[16259.184705] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d: State change failed: State change was refused by peer node
[16259.186149] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d/0 drbd1020 node-3: Failed: pdsk( Diskless -> DUnknown ) repl( Established -> Off )
[16259.186204] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d: ASSERTION context->flags & CS_SERIALIZE FAILED in change_cluster_wide_state
[16259.188355] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d: State change failed: State change was refused by peer node
[16259.189554] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d/0 drbd1020: Failed: quorum( yes -> no )
[16259.189558] drbd pvc-9a4e7d3f-ac7c-4ab1-af10-1209b7c6c13d/0 drbd1020 node-1: Failed: pdsk( UpToDate -> DUnknown ) repl( Established -> Off )
[16259.702141] BUG: kernel NULL pointer dereference, address: 0000000000000010
[16259.703513] #PF: supervisor read access in kernel mode
[16259.704930] #PF: error_code(0x0000) - not-present page
[16259.706101] PGD b120ca067 P4D b120ca067 PUD 8566dc067 PMD 0
[16259.707261] Oops: 0000 [#1] SMP NOPTI
[16259.708401] CPU: 19 PID: 2259963 Comm: drbd_r_pvc-9a4e Kdump: loaded Tainted: G           OE     5.15.67-6.cl9.x86_64 #1
[16259.709532] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.18227214.B64.2106252220 06/25/2021
[16259.711804] RIP: 0010:process_twopc+0x8fc/0x1060 [drbd]
[16259.713069] Code: db 48 89 44 24 10 e9 8b fc ff ff 48 c7 44 24 10 00 00 00 00 e9 7d fc ff ff 48 8b 7c 24 10 48 89 f8 48 85 ff 0f 84 32 04 00 00 <48> 8b 78 10 8b 87 38 01 00 00 85 c0 0f 84 d5 01 00 00 f0 ff 87 70
[16259.715535] RSP: 0018:ffff98b519743d40 EFLAGS: 00010046
[16259.716711] RAX: 0000000000000000 RBX: ffff88b43c120800 RCX: 0000000000000000
[16259.717814] RDX: 0000000000000000 RSI: ffff88bd916ec068 RDI: 0000000000000000
[16259.718860] RBP: ffff88bd916ec000 R08: 0000000000000000 R09: ffff88bd916ec060
[16259.720007] R10: 0000000000000000 R11: 0000000000000000 R12: ffff98b519743e70
[16259.721127] R13: 0000000000000000 R14: ffff98b519743da0 R15: ffff88b77c1b5000
[16259.722183] FS:  0000000000000000(0000) GS:ffff88cb3c2c0000(0000) knlGS:0000000000000000
[16259.723186] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16259.724153] CR2: 0000000000000010 CR3: 00000011537cc005 CR4: 0000000000770ee0
[16259.725140] PKRU: 55555554
[16259.726164] Call Trace:
[16259.727087]  <TASK>
[16259.728015]  ? dtt_recv+0xbb/0x180 [drbd_transport_tcp]
[16259.728932]  receive_twopc+0x97/0x100 [drbd]
[16259.729851]  ? process_twopc+0x1060/0x1060 [drbd]
[16259.730750]  drbdd+0x145/0x290 [drbd]
[16259.731742]  drbd_receiver+0x41/0x60 [drbd]
[16259.732918]  drbd_thread_setup+0x74/0x1e0 [drbd]
[16259.733821]  ? __drbd_next_peer_device_ref+0x120/0x120 [drbd]
[16259.734701]  kthread+0x124/0x150
[16259.735609]  ? set_kthread_struct+0x50/0x50
[16259.736459]  ret_from_fork+0x1f/0x30
[16259.737274]  </TASK>

it showing a NULL ptr at process_twopc .

Thanks!

@rck
Copy link
Member

rck commented Nov 17, 2023

does it reproduce with 9.1.17 (which would be the current version)? https://pkg.linbit.com//downloads/drbd/9/drbd-9.1.17.tar.gz

@qiyuanzhi
Copy link
Author

I'm not sure. This is the first time i got this panic, and it works good when create/delete volume in the past.

I try to reproduce this bug, but it doesn't occured.

Is it resolved in 9.1.17 ?

@rck
Copy link
Member

rck commented Nov 17, 2023

I'm sure there have been bugs that got fixed, not sure if that particular issue rings any bells by one of the devs. It is just that most of us don't really bother to spend time on issues for outdated versions. Trying to even find a reproducer is one thing, but spending that time on an old version to find out it got changed/fixed is something else. Sorry, just stating the facts, maybe you are lucky :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants