New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
osdc/Objecter: fix bugs in explicit naming of op spg_t #13534
Conversation
Rather than recalculate every op on every map (which actually can take up a noticeable amount of time when you multiple every op by 10 microseconds), why not check if the pg count changed and recalculate if so? |
src/osdc/Objecter.cc
Outdated
@@ -1208,18 +1208,29 @@ void Objecter::handle_osd_map(MOSDMap *m) | |||
|
|||
cluster_full = cluster_full || _osdmap_full_flag(); | |||
update_pool_full_map(pool_full_map); | |||
|
|||
// check all outstanding requests on every epoch, including need_resend | |||
for (auto p = need_resend.begin(); p != need_resend.end(); ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hang on, need_resend is empty here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a loop.. the bug comes up when we get multiple maps at once and fail to _calc_target on subsequent maps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh duh. Yeah, this looks good.
We could. I lean toward keeping it simple for now because its (1) easier
to understand, (2) in practice objecter seems to get one map at a time
anyway (so it wouldn't matter), and (3) when it doesn't it's because the
client is sufficiently idle and probably not in a latency-sensitive mode
anyway.
|
Could move the need_resend check outside the loop, inside a check for whether we processed multiple maps - it only needs to be done once right? |
src/osdc/Objecter.cc
Outdated
ldout(cct, 10) << __func__ << " checking op " << p->first << dendl; | ||
int r = _calc_target(&op->target, nullptr); | ||
if (r == RECALC_OP_TARGET_POOL_DNE) { | ||
p = need_resend.erase(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we still need the _check_op_pool_dne() call to cleanup the callback? it won't gen scanned later since _scan_requests() reset op->session when it was added to the resend list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, fixing with
ldout(cct, 10) << __func__ << " checking op " << p->first << dendl; int r = _calc_target(&op->target, nullptr); if (r == RECALC_OP_TARGET_POOL_DNE) { - p = need_resend.erase(p); + ++p; + OSDSession::unique_lock sl(op->session->lock); + _check_op_pool_dne(op, sl); } else { ++p; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op->session will have been set to NULL by _session_op_remove() when the op was added to need_resend...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ick
Hmm yeah, that's doable.. fixing |
140c408
to
84a1907
Compare
@jdurgin updated |
src/osdc/Objecter.cc
Outdated
ldout(cct, 10) << __func__ << " checking op " << p->first << dendl; | ||
int r = _calc_target(&op->target, nullptr); | ||
if (r == RECALC_OP_TARGET_POOL_DNE) { | ||
++p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still need to remove it from need_resend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rest looks good
84a1907
to
2c82f16
Compare
Signed-off-by: Sage Weil <sage@redhat.com>
Check whether the request hobj maps to the current pg_t. If we have the osd_debug_misdirected_ops setting enabled (as teuthology does), assert out as well so that the error is easy to spot. This catches bugs in the Objecter (especially the new code that explicitly names the spg_t for the request). Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
We need to make sure we update the mapping and get an accurate actual_pgid value by recalcuating the mapping on every map change. Otherwise, we may not notice a split (and subsequent actual_pgid change) and resend the same op with a stale spg_t. To fix this, - _calc_target on need_resend - update target regardless of current con Signed-off-by: Sage Weil <sage@redhat.com>
2c82f16
to
bbd9f0f
Compare
lgtm |
@@ -1713,6 +1713,17 @@ void PrimaryLogPG::do_op(OpRequestRef& op) | |||
hobject_t head = m->get_hobj(); | |||
head.snap = CEPH_NOSNAP; | |||
|
|||
if (!info.pgid.pgid.contains( | |||
info.pgid.pgid.get_split_bits(pool.info.get_pg_num()), head)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @liewegas , I am using kernel rbd (kernel 3.10 in RHEL 7.4) as a client for a 12.2.12 ceph cluster. But I found an error message as below:
2019-07-02 03:58:43.724305 osd.3 osd.3 192.168.15.3:6800/37773 1 : cluster [WRN] 6.0 does not contain 6:06a674d9:::rbd_data.11d36b8b4567.0000000000000023:head op osd_op(client.4615.1:286132 6.9b2e6560 6:06a674d9:::rbd_data.11d36b8b4567.0000000000000023:head [set-alloc-hint object_size 4194304 write_size 4194304,write 0~4096] snapc 0=[] ondisk+write+known_if_redirected e95) v4
meanwhile, a request in client is hanging and never to be complete.:
REQUESTS 1 homeless 0
286132 osd3 6.9b2e6560 [3,5,1]/3 [3,5,1]/3 rbd_data.11d36b8b4567.0000000000000023 0x400024 1 0'0 set-alloc-hint,write
At that time, we are doing pg splitting, and yes, the client op epoch is e95 but server epoch is already e99.
But we reach this warning because the kernel client in RHEL 7.4 is not supporting CEPH_FEATURE_OSD_POOLRESEND. So it was not dropped in can_discard_op():
} else if (m->get_connection()->has_feature(CEPH_FEATURE_OSD_POOLRESEND)) {
// < luminous client
if (m->get_map_epoch() < pool.info.get_last_force_op_resend_preluminous()) {
dout(7) << __func__ << " sent before last_force_op_resend_preluminous "
<< pool.info.last_force_op_resend_preluminous
<< ", dropping" << *m << dendl;
return true;
}
}
My question is, what should we do in this case, when client is not supporting CEPH_FEATURE_OSD_POOLRESEND?
assert on that to force the client resend?
assert(m->get_connection()->has_feature(CEPH_FEATURE_OSD_POOLRESEND))
Fixed this (and tested) as part of wip-faster-dispatch.