Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacific: osd: fix shard-threads cannot wakeup bug #51262

Merged
merged 1 commit into from Nov 28, 2023

Conversation

k0ste
Copy link
Contributor

@k0ste k0ste commented Apr 27, 2023

backport tracker: https://tracker.ceph.com/issues/52841


backport of #43360
parent tracker: https://tracker.ceph.com/issues/52781

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

@k0ste k0ste requested a review from a team as a code owner April 27, 2023 13:16
@github-actions github-actions bot added the core label Apr 27, 2023
@github-actions github-actions bot added this to the pacific milestone Apr 27, 2023
@k0ste
Copy link
Contributor Author

k0ste commented Jul 18, 2023

needs-qa

Reproduce:
(1) ceph cluster not running any client IO
(2) only ceph osd in osd.14 operation

Reason:
(1) one shard-queue has three shard-threads
(2) one or some PeeringOp's epoch > osdmap's epoch held by current osd,
    and these PeeringOp _add_slot_waiter()
(3) shard-queue become empty and three shard-threads cond.wait()
(4) new osdmap consume and it _wake_pg_slot()
    Problem in here
	1> OSDShard::consume() exec loop all pg's slot wait
 	   and requeue more than one PeeringOp to shard-queue
        2> but it only notify one shard-thread to wakeup,
           the other two shard-threads continue cond.wait()
	3> OSD::ShardedOpWQ::_enqueue() found the shard-queue not empty
	   and not notify all shard-thread to wakeup

     In a period of time, only one shard-thread of 3 shard-threads is running.

Fixes: https://tracker.ceph.com/issues/52781
Signed-off-by: Jianwei Zhang <jianwei1216@qq.com>
Change-Id: I4617db2fd95082007e6d9fa2b60f17f2a6296b5b
(cherry picked from commit 566b60b)
@neha-ojha neha-ojha modified the milestones: pacific, v16.2.15 Nov 14, 2023
@yuriw yuriw merged commit 0edb66e into ceph:pacific Nov 28, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants