Skip to content

Commit

Permalink
osd: wake up more shard workers when new osdmap is consumed
Browse files Browse the repository at this point in the history
Reproduce:
(1) ceph cluster not running any client IO
(2) only ceph osd in osd.14 operation

Reason:
(1) one shard-queue has three shard-threads
(2) one or some PeeringOp's epoch > osdmap's epoch held by current osd,
    and these PeeringOp _add_slot_waiter()
(3) shard-queue become empty and three shard-threads cond.wait()
(4) new osdmap consume and it _wake_pg_slot()
    Problem in here
	1> OSDShard::consume() exec loop all pg's slot wait
 	   and requeue more than one PeeringOp to shard-queue
        2> but it only notify one shard-thread to wakeup,
           the other two shard-threads continue cond.wait()
	3> OSD::ShardedOpWQ::_enqueue() found the shard-queue not empty
	   and not notify all shard-thread to wakeup

     In a period of time, only one shard-thread of 3 shard-threads is running.

Fixes: https://tracker.ceph.com/issues/52781
Signed-off-by: Jianwei Zhang <jianwei1216@qq.com>
Change-Id: I4617db2fd95082007e6d9fa2b60f17f2a6296b5b
(cherry picked from commit 566b60b)
  • Loading branch information
jianwei1216 authored and k0ste committed Jul 18, 2023
1 parent 014f9fb commit bad1077
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 6 deletions.
17 changes: 12 additions & 5 deletions src/osd/OSD.cc
Expand Up @@ -10624,7 +10624,7 @@ void OSDShard::consume_map(
dout(10) << new_osdmap->get_epoch()
<< " (was " << (old_osdmap ? old_osdmap->get_epoch() : 0) << ")"
<< dendl;
bool queued = false;
int queued = 0;

// check slots
auto p = pg_slots.begin();
Expand All @@ -10651,8 +10651,7 @@ void OSDShard::consume_map(
dout(20) << __func__ << " " << pgid
<< " pending_peering first epoch " << first
<< " <= " << new_osdmap->get_epoch() << ", requeueing" << dendl;
_wake_pg_slot(pgid, slot);
queued = true;
queued += _wake_pg_slot(pgid, slot);
}
++p;
continue;
Expand Down Expand Up @@ -10692,14 +10691,18 @@ void OSDShard::consume_map(
}
if (queued) {
std::lock_guard l{sdata_wait_lock};
sdata_cond.notify_one();
if (queued == 1)
sdata_cond.notify_one();
else
sdata_cond.notify_all();
}
}

void OSDShard::_wake_pg_slot(
int OSDShard::_wake_pg_slot(
spg_t pgid,
OSDShardPGSlot *slot)
{
int count = 0;
dout(20) << __func__ << " " << pgid
<< " to_process " << slot->to_process
<< " waiting " << slot->waiting
Expand All @@ -10708,12 +10711,14 @@ void OSDShard::_wake_pg_slot(
i != slot->to_process.rend();
++i) {
scheduler->enqueue_front(std::move(*i));
count++;
}
slot->to_process.clear();
for (auto i = slot->waiting.rbegin();
i != slot->waiting.rend();
++i) {
scheduler->enqueue_front(std::move(*i));
count++;
}
slot->waiting.clear();
for (auto i = slot->waiting_peering.rbegin();
Expand All @@ -10724,10 +10729,12 @@ void OSDShard::_wake_pg_slot(
// someday, if we decide this inefficiency matters
for (auto j = i->second.rbegin(); j != i->second.rend(); ++j) {
scheduler->enqueue_front(std::move(*j));
count++;
}
}
slot->waiting_peering.clear();
++slot->requeue_seq;
return count;
}

void OSDShard::identify_splits_and_merges(
Expand Down
2 changes: 1 addition & 1 deletion src/osd/OSD.h
Expand Up @@ -1094,7 +1094,7 @@ struct OSDShard {
const OSDMapRef& osdmap,
unsigned *pushes_to_free);

void _wake_pg_slot(spg_t pgid, OSDShardPGSlot *slot);
int _wake_pg_slot(spg_t pgid, OSDShardPGSlot *slot);

void identify_splits_and_merges(
const OSDMapRef& as_of_osdmap,
Expand Down

0 comments on commit bad1077

Please sign in to comment.