New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osdc/Objecter: fix bugs in explicit naming of op spg_t #13534

Merged
merged 5 commits into from Feb 24, 2017

Conversation

Projects
None yet
3 participants
@liewegas
Member

liewegas commented Feb 20, 2017

Fixed this (and tested) as part of wip-faster-dispatch.

@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Feb 20, 2017

Rather than recalculate every op on every map (which actually can take up a noticeable amount of time when you multiple every op by 10 microseconds), why not check if the pg count changed and recalculate if so?

@@ -1208,18 +1208,29 @@ void Objecter::handle_osd_map(MOSDMap *m)
cluster_full = cluster_full || _osdmap_full_flag();
update_pool_full_map(pool_full_map);
// check all outstanding requests on every epoch, including need_resend
for (auto p = need_resend.begin(); p != need_resend.end(); ) {

This comment has been minimized.

@gregsfortytwo

gregsfortytwo Feb 20, 2017

Member

hang on, need_resend is empty here?

This comment has been minimized.

@liewegas

liewegas Feb 20, 2017

Member

it's a loop.. the bug comes up when we get multiple maps at once and fail to _calc_target on subsequent maps

This comment has been minimized.

@gregsfortytwo

gregsfortytwo Feb 20, 2017

Member

Oh duh. Yeah, this looks good.

@liewegas

This comment has been minimized.

Member

liewegas commented Feb 20, 2017

@jdurgin

This comment has been minimized.

Member

jdurgin commented Feb 20, 2017

Could move the need_resend check outside the loop, inside a check for whether we processed multiple maps - it only needs to be done once right?

ldout(cct, 10) << __func__ << " checking op " << p->first << dendl;
int r = _calc_target(&op->target, nullptr);
if (r == RECALC_OP_TARGET_POOL_DNE) {
p = need_resend.erase(p);

This comment has been minimized.

@jdurgin

jdurgin Feb 20, 2017

Member

don't we still need the _check_op_pool_dne() call to cleanup the callback? it won't gen scanned later since _scan_requests() reset op->session when it was added to the resend list

This comment has been minimized.

@liewegas

liewegas Feb 20, 2017

Member

good call, fixing with

          ldout(cct, 10) << __func__ << "  checking op " << p->first << dendl;
          int r = _calc_target(&op->target, nullptr);
          if (r == RECALC_OP_TARGET_POOL_DNE) {
-           p = need_resend.erase(p);
+           ++p;
+           OSDSession::unique_lock sl(op->session->lock);
+           _check_op_pool_dne(op, sl);
          } else {
            ++p;
          }

This comment has been minimized.

@jdurgin

jdurgin Feb 20, 2017

Member

op->session will have been set to NULL by _session_op_remove() when the op was added to need_resend...

This comment has been minimized.

@liewegas
@liewegas

This comment has been minimized.

Member

liewegas commented Feb 20, 2017

Hmm yeah, that's doable.. fixing

@liewegas

This comment has been minimized.

Member

liewegas commented Feb 20, 2017

@jdurgin updated

ldout(cct, 10) << __func__ << " checking op " << p->first << dendl;
int r = _calc_target(&op->target, nullptr);
if (r == RECALC_OP_TARGET_POOL_DNE) {
++p;

This comment has been minimized.

@jdurgin

jdurgin Feb 20, 2017

Member

still need to remove it from need_resend

This comment has been minimized.

@jdurgin

jdurgin Feb 20, 2017

Member

the rest looks good

liewegas added some commits Feb 18, 2017

vstart.sh: osd debug misdirected ops = true
Signed-off-by: Sage Weil <sage@redhat.com>
osd: warn on ops directed to the wrong pg_t
Check whether the request hobj maps to the current pg_t.  If we have the
osd_debug_misdirected_ops setting enabled (as teuthology does), assert out
as well so that the error is easy to spot.  This catches bugs in the
Objecter (especially the new code that explicitly names the spg_t for the
request).

Signed-off-by: Sage Weil <sage@redhat.com>
osdc/Objecter: track latest epoch in op_target_t
Signed-off-by: Sage Weil <sage@redhat.com>
osdc/Objecter: _calc_target on all ops so that we notice splits
We need to make sure we update the mapping and get an accurate actual_pgid
value by recalcuating the mapping on every map change.  Otherwise, we may
not notice a split (and subsequent actual_pgid change) and resend the same
op with a stale spg_t.  To fix this,

- _calc_target on need_resend
- update target regardless of current con

Signed-off-by: Sage Weil <sage@redhat.com>
@gregsfortytwo

This comment has been minimized.

Member

gregsfortytwo commented Feb 24, 2017

lgtm

@liewegas liewegas merged commit 674ae80 into ceph:master Feb 24, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details

@liewegas liewegas deleted the liewegas:wip-objecter-fixes branch Feb 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment