Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: initial minimal efforts to clean up PG interface #17708

Merged
merged 72 commits into from Oct 9, 2017

Conversation

Projects
None yet
7 participants
@liewegas
Copy link
Member

commented Sep 13, 2017

@branch-predictor the biggest change here is the clean up of the forced recovery interface

@liewegas liewegas requested a review from branch-predictor Sep 13, 2017

@joscollin

This comment has been minimized.

Copy link
Member

commented Sep 14, 2017

@liewegas Build failed.

/home/jenkins-build/build/workspace/ceph-pull-requests/src/osd/PG.h:2443:15: error: ‘static bool PG::_has_removal_flag(ObjectStore*, spg_t)’ is protected
   static bool _has_removal_flag(ObjectStore *store, spg_t pgid);
               ^
/home/jenkins-build/build/workspace/ceph-pull-requests/src/tools/ceph_objectstore_tool.cc:379:62: error: within this context
        (it->is_pg(&pgid) && PG::_has_removal_flag(store, pgid))) {
                                                              ^
@branch-predictor

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2017

@liewegas I don't like producing a new log line for each of pg involved (and right now we even don't know which pgs are involved), but otherwise changes to forced recovery seem OK.

@liewegas

This comment has been minimized.

Copy link
Member Author

commented Sep 14, 2017

It's at debug level 10... should only be seen by developers?

@branch-predictor

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2017

@liewegas true, but if something breaks and you debug production, you'd prefer to increase log level for a few minutes over attaching gdb to running osd, risking I/O stalls that could be hurting cluster users.

@liewegas liewegas force-pushed the liewegas:wip-pg branch from 3149a01 to 090ee2c Sep 15, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

commented Sep 15, 2017

I switched it to 20.. is that better? I think it's helpful to print when we actually switch the state. If necessary we can add back the pgid list in the caller, but i'm not sure it's needed?

@branch-predictor

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2017

10 or 20 doesn't make much of difference as long as it's there. In case the pgs won't get forced flags, you may need to find out why, whether that was wrongly issued command that missed some pgs (for example, a bug in some script that calls forced recovery) or bad timing that caused pgs in question to change state to one that's invalid for these commands after issuing them.

@liewegas liewegas force-pushed the liewegas:wip-pg branch 2 times, most recently from c189e59 to 083dbfa Sep 18, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

commented Sep 18, 2017

The last commit adds a single debug line with a summary.. does that help?

@liewegas liewegas force-pushed the liewegas:wip-pg branch from 380520a to 57a2885 Sep 18, 2017

@branch-predictor

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2017

Yeah, looks reasonably except in case when the "did" set is empty.

@liewegas liewegas force-pushed the liewegas:wip-pg branch 3 times, most recently from efbcefe to 75119b4 Sep 19, 2017

@liewegas liewegas requested a review from tchaikov Sep 22, 2017

@@ -314,7 +314,12 @@ class PG : public DoutPrefixProvider {

ZTracer::Endpoint trace_endpoint;

void lock_suspend_timeout(ThreadPool::TPHandle &handle);
void lock_suspend_timeout(ThreadPool::TPHandle &handle) {

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

looks good, but i think the compiler will do this with -O2.

@@ -625,6 +625,10 @@ class PG : public DoutPrefixProvider {

set<int> blocked_by; ///< osds we are blocked by (for pg stats)

public:
const vector<int> get_acting() const {

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

better off returning a reference .

virtual bool agent_work(int max) = 0;
virtual bool agent_work(int max, int agent_flush_quota) = 0;
virtual void agent_stop() = 0;
virtual void agent_delay() = 0;
virtual void agent_clear() = 0;
virtual void agent_choose_mode_restart() = 0;
protected:

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

nit, move protected: one line down:

void _finish_recovery(Context *c);
protected:
struct C_PG_FinishRecovery : public Context {

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

would it be easier if we use FunctionContext or LambdaContext?

@@ -372,13 +372,15 @@ class PG : public DoutPrefixProvider {
pg_info_t info; ///< current pg info
pg_info_t last_written_info; ///< last written info
__u8 info_struct_v;
public:

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

why are we marking cur_struct_v and compat_struct_v public? we have their accessors already.

@@ -8204,6 +8204,23 @@ void PG::dump_pgstate_history(Formatter *f)
unlock();
}

void PG::dump_missing(Formatter *f)

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

could mark this method const.

for (auto l : missing_loc.get_locations(i.first)) {
f->dump_object("shard", l);
}
f->close_section();

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

missing a close_section() call here.

This comment has been minimized.

Copy link
@tchaikov

tchaikov Oct 4, 2017

Contributor

@liewegas not addressed

}

fout << "*** osd " << whoami << ": dump_missing ***" << std::endl;
JSONFormatter jf(true);

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

might want to use f is it is not null?

@@ -11018,22 +11018,22 @@ bool PrimaryLogPG::start_recovery_ops(
int unfound = get_num_unfound();
if (unfound) {
dout(10) << " still have " << unfound << " unfound" << dendl;
return work_in_progress;
return !work_in_progress;

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

if work_in_progress is true, we will return at 2b432c1#diff-fb41013d27e932534adb50eb3de2aaa5R11007, so !work_in_progress always evaluates to true here.

This comment has been minimized.

Copy link
@tchaikov

tchaikov Oct 4, 2017

Contributor

@liewegas not addressed

This comment has been minimized.

Copy link
@liewegas

liewegas Oct 4, 2017

Author Member

it's fixed in ea32e82, not sure why github is still showing this

@@ -5817,6 +5817,7 @@ void PG::handle_loaded(RecoveryCtx *rctx)
dout(10) << "handle_loaded" << dendl;
Load evt;
recovery_state.handle_event(evt, rctx);
write_if_dirty(*rctx->transaction);

This comment has been minimized.

Copy link
@tchaikov

tchaikov Sep 25, 2017

Contributor

i am not sure if this works. in OSD::load_pgs(), a nullptr is passed in as the transaction of rctx. like

    PG::RecoveryCtx rctx(0, 0, 0, 0, 0, 0);
    pg->handle_loaded(&rctx);

can we dereference it here w/o checking it beforehand?

This comment has been minimized.

Copy link
@liewegas

liewegas Oct 4, 2017

Author Member

fixed this one already

@liewegas liewegas force-pushed the liewegas:wip-pg branch from 75119b4 to d78b737 Sep 25, 2017

@liewegas liewegas force-pushed the liewegas:wip-pg branch 4 times, most recently from 1627ebd to ebb70e7 Sep 26, 2017

@liewegas liewegas force-pushed the liewegas:wip-pg branch from b8c8438 to 3af0644 Oct 3, 2017

liewegas added some commits Sep 18, 2017

osd/PG: make get_primary() public
Signed-off-by: Sage Weil <sage@redhat.com>
osd/PG: make on_removal() public
Signed-off-by: Sage Weil <sage@redhat.com>
osd/PG: make update_snap_mapper_bits public
Signed-off-by: Sage Weil <sage@redhat.com>
osd: drop redundant is_active() check before scrub_sched()
Signed-off-by: Sage Weil <sage@redhat.com>
osd/PG: drop OSD friend, finally!
Signed-off-by: Sage Weil <sage@redhat.com>
osd: print summary for forced backfill/recovery to debug
Signed-off-by: Sage Weil <sage@redhat.com>
ceph-objectstore-tool: do not clear same_interval_since and PastInter…
…vals

We dropped the OSD recalculation code for this.  Import it.

Signed-off-by: Sage Weil <sage@redhat.com>
ceph-objectstore-tool: remove rm-past-intervals op
The OSD doesn't rebuild this on demand anymore.

Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas force-pushed the liewegas:wip-pg branch from 2c0c325 to 2aeb1d2 Oct 6, 2017

@liewegas

This comment has been minimized.

Copy link
Member Author

commented Oct 6, 2017

@dzafman @jdurgin Okay, got to the bottom of the import vs osdmap epoch thing.

Previously,

  • import blindly set pg epoch to osd's latest
  • import cleared same_interval_since, which told the osd to rebuild history and past intervals
  • osd would rebuild on startup

Now,

  • cot will fail if the pg file's epoch is < osd's oldest_map. unless --force is passed, but that means the PastIntervals won't be 100% accurate. (This is no different than before; it's just now explicit in cot instead of somethign you might notice by digging through the OSD log. In practice nobody is likely to be importing PGs from before the cluster was last clean.)
  • cot will walk the pg through osdmaps up to current_epoch, updating history (same_*_since) and PastIntervals

The net result is that the imported PG will already be up to date when the OSD starts, vs previously where it would have to do a rebuild on startup and there would be some hidden activity slowing osd startup.

(See the last 5 commits or so)

liewegas added some commits Oct 6, 2017

ceph-objectstore-tool: refuse to import PG older that OSD's oldest_epoch
We don't have a way to construct a valid PastIntervals history.

Signed-off-by: Sage Weil <sage@redhat.com>
ceph-objectstore-tool: drop unused biginfo_oid
Signed-off-by: Sage Weil <sage@redhat.com>
ceph-objectstore-tool: bring past_intervals up to date on import
We can't blinding fast-forward the pg epoch to the latest map without
also updating the pg history and PastIntervals.  Do that at import time.

Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas force-pushed the liewegas:wip-pg branch from 2aeb1d2 to 8ddaec1 Oct 6, 2017

@dzafman

This comment has been minimized.

Copy link
Member

commented Oct 6, 2017

@liewegas The last commits particularly the changes to ceph_objectstore_tool.cc look good to me.

osd/PGLog: drop old compat coll/oid args to read_log_and_missing
The oldest version we care about is 10, and these were for <8.

Signed-off-by: Sage Weil <sage@redhat.com>

@liewegas liewegas force-pushed the liewegas:wip-pg branch from 8ddaec1 to 582ee8d Oct 6, 2017

@liewegas liewegas merged commit 96ddf5c into ceph:master Oct 9, 2017

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@liewegas liewegas deleted the liewegas:wip-pg branch Oct 9, 2017

@asomers

This comment has been minimized.

Copy link
Contributor

commented on 7c1cba8 Oct 9, 2017

I believe this commit is breaking the build with Clang. See http://tracker.ceph.com/issues/21739 .

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2017

@liewegas This PR is pretty big, but its absence in luminous is causing quite some conflicts - see http://tracker.ceph.com/issues/22069 . . . @dzafman suggests backporting this PR to luminous - do you agree?

@smithfarm

This comment has been minimized.

Copy link
Contributor

commented Dec 13, 2017

(Optimistically opened http://tracker.ceph.com/issues/22423 for the backport.)

@liewegas

This comment has been minimized.

Copy link
Member Author

commented Dec 13, 2017

It makes me very nervous to backport this. Between luminous and this PR a ton of compatibility code for <luminous was ripped out, so I'm worried that there is some dependency (although I guess that will turn on during a backport).

FWIW I was planning on backporting the recovery reservation fixes and it was too painful... maybe this would help, but I'm also okay with not backporting it at all.

I guess give it a go and see how it goes?

@liewegas

This comment has been minimized.

Copy link
Member Author

commented Dec 13, 2017

For example, 72d50e4 can't be backported.

@liewegas

This comment has been minimized.

Copy link
Member Author

commented Dec 13, 2017

I think the past intervals stuff shoudl backport okay. Just pay attention when you're working through the patches for anything that's needed for kraken and jewel compat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.