Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: pg: be more careful with locking around forced pg recovery #16712

Merged
merged 1 commit into from Aug 1, 2017

Conversation

Projects
None yet
4 participants
@gregsfortytwo
Copy link
Member

gregsfortytwo commented Jul 31, 2017

This does several little things that add up to big concurrency and safety
improvements:

  • Switch to passing around PGRefs instead of raw pointers, which is
    generally a good idea
  • drop the pg_map_lock once we're done looking up the PGRefs, since
    we don't need it and holding the PG pointer alive was the only previous
    thing that might have made it necessary
  • don't hold the recovery_lock since we don't need any OSD-level
    synchronization
  • make sure the PG is not being deleted before we do a force-change of its
    state

Fixes: http://tracker.ceph.com/issues/20808

Signed-off-by: Greg Farnum gfarnum@redhat.com

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

gregsfortytwo commented Jul 31, 2017

This is only compile-tested, but the original PR came with tests so I assume running it through teuthology and make check is all we need. :)

@jdurgin

This comment has been minimized.

Copy link
Member

jdurgin commented Jul 31, 2017

This looks good - still need to have the pg_lock protecting the read via get_state() though - maybe move taking that lock into the loops over pgs

@gregsfortytwo gregsfortytwo force-pushed the gregsfortytwo:wip-20808 branch from c3df5f7 to 0aa85c8 Aug 1, 2017

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

gregsfortytwo commented Aug 1, 2017

Right, updated.

@jdurgin

This comment has been minimized.

Copy link
Member

jdurgin commented Aug 1, 2017

need the lock for the cancel loop too

@branch-predictor

This comment has been minimized.

Copy link
Member

branch-predictor commented Aug 1, 2017

Doesn't compile:

/home/ubuntu/gf-forcedrecov/ceph/src/osd/OSD.cc: In member function ‘void OSDService::adjust_pg_priorities(const std::vector<boost::intrusive_ptr<PG> >&, int)’:                                                                             
/home/ubuntu/gf-forcedrecov/ceph/src/osd/OSD.cc:9229:10: error: ‘class PG’ has no member named ‘change_recovery_force_mode’                                   
       i->change_recovery_force_mode(newstate, true);                          
          ^                                                                    
make[2]: *** [src/osd/CMakeFiles/osd.dir/OSD.cc.o] Error 1                     
make[2]: *** Waiting for unfinished jobs....                                   
osd: pg: be more careful with locking around forced pg recovery
This does several little things that add up to big concurrency and safety
improvements:
* Switch to passing around PGRefs instead of raw pointers, which is
  generally a good idea
* drop the pg_map_lock once we're done looking up the PGRefs, since
  we don't need it and holding the PG pointer alive was the only previous
  thing that might have made it necessary
* don't hold the recovery_lock since we don't need any OSD-level
  synchronization
* make sure the PG is not being deleted before we do a force-change of its
  state

Fixes: http://tracker.ceph.com/issues/20808

Signed-off-by: Greg Farnum <gfarnum@redhat.com>
@@ -9233,10 +9233,12 @@ void OSDService::adjust_pg_priorities(vector<PG*> pgs, int newflags)
// make sure the PG is in correct state before forcing backfill or recovery, or
// else we'll make PG keeping FORCE_* flag forever, requiring osds restart
// or forcing somehow recovery/backfill.
i->lock();

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor Aug 1, 2017

Member

As @jdurgin wrote, OFR_CANCEL needs similar lock (and **_**change_recovery_force_mode).

@gregsfortytwo gregsfortytwo force-pushed the gregsfortytwo:wip-20808 branch from 0aa85c8 to f19a3d9 Aug 1, 2017

@gregsfortytwo

This comment has been minimized.

Copy link
Member Author

gregsfortytwo commented Aug 1, 2017

@gregsfortytwo grumbles about having it right on his server and git amend commit UIs

@branch-predictor
Copy link
Member

branch-predictor left a comment

Now it looks right.

@jdurgin

jdurgin approved these changes Aug 1, 2017

@liewegas liewegas merged commit d62716b into ceph:master Aug 1, 2017

3 of 4 checks passed

make check make check failed
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check (arm64) make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.