osdc/Objecter: fix bugs in explicit naming of op spg_t #13534

liewegas · 2017-02-20T12:47:09Z

Fixed this (and tested) as part of wip-faster-dispatch.

gregsfortytwo · 2017-02-20T17:06:49Z

Rather than recalculate every op on every map (which actually can take up a noticeable amount of time when you multiple every op by 10 microseconds), why not check if the pg count changed and recalculate if so?

gregsfortytwo · 2017-02-20T17:10:48Z

src/osdc/Objecter.cc

@@ -1208,18 +1208,29 @@ void Objecter::handle_osd_map(MOSDMap *m)

 	cluster_full = cluster_full || _osdmap_full_flag();
 	update_pool_full_map(pool_full_map);
+
+	// check all outstanding requests on every epoch, including need_resend
+	for (auto p = need_resend.begin(); p != need_resend.end(); ) {


hang on, need_resend is empty here?

it's a loop.. the bug comes up when we get multiple maps at once and fail to _calc_target on subsequent maps

Oh duh. Yeah, this looks good.

liewegas · 2017-02-20T17:14:39Z

We could. I lean toward keeping it simple for now because its (1) easier to understand, (2) in practice objecter seems to get one map at a time anyway (so it wouldn't matter), and (3) when it doesn't it's because the client is sufficiently idle and probably not in a latency-sensitive mode anyway.

jdurgin · 2017-02-20T17:47:16Z

Could move the need_resend check outside the loop, inside a check for whether we processed multiple maps - it only needs to be done once right?

jdurgin · 2017-02-20T17:51:48Z

src/osdc/Objecter.cc

+	  ldout(cct, 10) << __func__ << "  checking op " << p->first << dendl;
+	  int r = _calc_target(&op->target, nullptr);
+	  if (r == RECALC_OP_TARGET_POOL_DNE) {
+	    p = need_resend.erase(p);


don't we still need the _check_op_pool_dne() call to cleanup the callback? it won't gen scanned later since _scan_requests() reset op->session when it was added to the resend list

good call, fixing with

ldout(cct, 10) << __func__ << " checking op " << p->first << dendl; int r = _calc_target(&op->target, nullptr); if (r == RECALC_OP_TARGET_POOL_DNE) { - p = need_resend.erase(p); + ++p; + OSDSession::unique_lock sl(op->session->lock); + _check_op_pool_dne(op, sl); } else { ++p; }

op->session will have been set to NULL by _session_op_remove() when the op was added to need_resend...

liewegas · 2017-02-20T18:21:23Z

Hmm yeah, that's doable.. fixing

liewegas · 2017-02-20T18:58:28Z

@jdurgin updated

jdurgin · 2017-02-20T19:08:44Z

src/osdc/Objecter.cc

+      ldout(cct, 10) << __func__ << "  checking op " << p->first << dendl;
+      int r = _calc_target(&op->target, nullptr);
+      if (r == RECALC_OP_TARGET_POOL_DNE) {
+	++p;


still need to remove it from need_resend

the rest looks good

Signed-off-by: Sage Weil <sage@redhat.com>

Check whether the request hobj maps to the current pg_t. If we have the osd_debug_misdirected_ops setting enabled (as teuthology does), assert out as well so that the error is easy to spot. This catches bugs in the Objecter (especially the new code that explicitly names the spg_t for the request). Signed-off-by: Sage Weil <sage@redhat.com>

Signed-off-by: Sage Weil <sage@redhat.com>

We need to make sure we update the mapping and get an accurate actual_pgid value by recalcuating the mapping on every map change. Otherwise, we may not notice a split (and subsequent actual_pgid change) and resend the same op with a stale spg_t. To fix this, - _calc_target on need_resend - update target regardless of current con Signed-off-by: Sage Weil <sage@redhat.com>

gregsfortytwo · 2017-02-24T18:17:11Z

lgtm

yangdongsheng · 2019-07-03T07:59:36Z

src/osd/PrimaryLogPG.cc

@@ -1713,6 +1713,17 @@ void PrimaryLogPG::do_op(OpRequestRef& op)
  hobject_t head = m->get_hobj();
  head.snap = CEPH_NOSNAP;

+  if (!info.pgid.pgid.contains(
+	info.pgid.pgid.get_split_bits(pool.info.get_pg_num()), head)) {


Hi @liewegas , I am using kernel rbd (kernel 3.10 in RHEL 7.4) as a client for a 12.2.12 ceph cluster. But I found an error message as below:

2019-07-02 03:58:43.724305 osd.3 osd.3 192.168.15.3:6800/37773 1 : cluster [WRN] 6.0 does not contain 6:06a674d9:::rbd_data.11d36b8b4567.0000000000000023:head op osd_op(client.4615.1:286132 6.9b2e6560 6:06a674d9:::rbd_data.11d36b8b4567.0000000000000023:head [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4096] snapc 0=[] ondisk+write+known_if_redirected e95) v4

meanwhile, a request in client is hanging and never to be complete.:

REQUESTS 1 homeless 0 286132 osd3 6.9b2e6560 [3,5,1]/3 [3,5,1]/3 rbd_data.11d36b8b4567.0000000000000023 0x400024 1 0'0 set-alloc-hint,write

At that time, we are doing pg splitting, and yes, the client op epoch is e95 but server epoch is already e99.

But we reach this warning because the kernel client in RHEL 7.4 is not supporting CEPH_FEATURE_OSD_POOLRESEND. So it was not dropped in can_discard_op():

} else if (m->get_connection()->has_feature(CEPH_FEATURE_OSD_POOLRESEND)) { // < luminous client if (m->get_map_epoch() < pool.info.get_last_force_op_resend_preluminous()) { dout(7) << __func__ << " sent before last_force_op_resend_preluminous " << pool.info.last_force_op_resend_preluminous << ", dropping" << *m << dendl; return true; } }

My question is, what should we do in this case, when client is not supporting CEPH_FEATURE_OSD_POOLRESEND?
assert on that to force the client resend?

assert(m->get_connection()->has_feature(CEPH_FEATURE_OSD_POOLRESEND))

liewegas added bug-fix common wip-sage-testing labels Feb 20, 2017

gregsfortytwo reviewed Feb 20, 2017

View reviewed changes

jdurgin reviewed Feb 20, 2017

View reviewed changes

liewegas force-pushed the wip-objecter-fixes branch from 140c408 to 84a1907 Compare February 20, 2017 18:58

jdurgin reviewed Feb 20, 2017

View reviewed changes

liewegas force-pushed the wip-objecter-fixes branch from 84a1907 to 2c82f16 Compare February 20, 2017 19:27

jdurgin approved these changes Feb 20, 2017

View reviewed changes

jdurgin added the needs-qa label Feb 20, 2017

liewegas removed the wip-sage-testing label Feb 21, 2017

liewegas added 5 commits February 23, 2017 10:29

vstart.sh: osd debug misdirected ops = true

930dc60

Signed-off-by: Sage Weil <sage@redhat.com>

osdc/Objecter: track latest epoch in op_target_t

57dfcb6

Signed-off-by: Sage Weil <sage@redhat.com>

osdc/Objecter: refactor pool dne check to make op->session optional

71f5309

Signed-off-by: Sage Weil <sage@redhat.com>

liewegas force-pushed the wip-objecter-fixes branch from 2c82f16 to bbd9f0f Compare February 23, 2017 15:30

liewegas added the wip-sage-testing label Feb 24, 2017

liewegas merged commit 674ae80 into ceph:master Feb 24, 2017

liewegas deleted the wip-objecter-fixes branch February 24, 2017 18:55

yangdongsheng reviewed Jul 3, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osdc/Objecter: fix bugs in explicit naming of op spg_t #13534

osdc/Objecter: fix bugs in explicit naming of op spg_t #13534

liewegas commented Feb 20, 2017 •

edited

gregsfortytwo commented Feb 20, 2017

gregsfortytwo Feb 20, 2017

liewegas Feb 20, 2017

gregsfortytwo Feb 20, 2017

liewegas commented Feb 20, 2017 via email

jdurgin commented Feb 20, 2017

jdurgin Feb 20, 2017

liewegas Feb 20, 2017

jdurgin Feb 20, 2017

liewegas Feb 20, 2017

liewegas commented Feb 20, 2017

liewegas commented Feb 20, 2017

jdurgin Feb 20, 2017

jdurgin Feb 20, 2017

gregsfortytwo commented Feb 24, 2017

yangdongsheng Jul 3, 2019 •

edited

osdc/Objecter: fix bugs in explicit naming of op spg_t #13534

osdc/Objecter: fix bugs in explicit naming of op spg_t #13534

Conversation

liewegas commented Feb 20, 2017 • edited

gregsfortytwo commented Feb 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liewegas commented Feb 20, 2017 via email

jdurgin commented Feb 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liewegas commented Feb 20, 2017

liewegas commented Feb 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregsfortytwo commented Feb 24, 2017

yangdongsheng Jul 3, 2019 • edited

Choose a reason for hiding this comment

liewegas commented Feb 20, 2017 •

edited

yangdongsheng Jul 3, 2019 •

edited