Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr: Release GIL before calling OSDMap::calc_pg_upmaps() #31064

Merged
merged 3 commits into from Nov 7, 2019

Conversation

dzafman
Copy link
Contributor

@dzafman dzafman commented Oct 22, 2019

Fixes: https://tracker.ceph.com/issues/42432

Signed-off-by: David Zafman dzafman@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs

@dzafman
Copy link
Contributor Author

dzafman commented Oct 22, 2019

@liewegas @xiexingguo @jdurgin I'm not sure if I put the PyEval_SaveThread() in the best location nor whether there are ramification to dropping the GIL there.

I was able to do ceph balancer status and ceph balancer show xxx while a test sleep was happening in OSDMap::calc_pg_upmaps().

@jdurgin
Copy link
Member

jdurgin commented Oct 22, 2019

hmm looking closer I see one potential race - since the balancer module inserts the plan object in optimize()->plan_create(), a user could call execute() on a partially initiallized plan (with the incremental in particular being updated).

To fix this we could only add the plan to self.plans after optimize() is done initializing it.

@dzafman
Copy link
Contributor Author

dzafman commented Oct 23, 2019

@jdurgin I fixed the issue with self.plans that you pointed out. In addition, now the code disallows the command "ceph balancer optimize ...." while the balancer is active. This prevents races with optimize being run in 2 threads.

There is still a race:

  1. active balancer is processing in calc_pg_upmaps()
  2. ceph balancer off
  3. ceph balancer optimize ....

I can fix with with a boolean "optimizing" that is set while active balancer is optimizing. So even if active = False, optimizing = True will still fail the balancer optimize command.

There is a lock in the ceph-mgr for incoming ceph commands, so if a manual "ceph balancer optimize ..." takes a long time, other ceph balancer commands like status will hang as before. At least this prevents activating the balancer while a manual optimize is already running.

I believe that this is the relevant code:
bool DaemonServer::handle_command(const ref_t<MMgrCommand>& m)
{
std::lock_guard l(lock);

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/pybind/mgr/balancer/module.py Show resolved Hide resolved
src/pybind/mgr/balancer/module.py Show resolved Hide resolved
Prevent optimize and execute commands from running with active balancer

Fixes: https://tracker.ceph.com/issues/42432

Signed-off-by: David Zafman <dzafman@redhat.com>
Add balancer status fields so that slow optimizations can be detected

Signed-off-by: David Zafman <dzafman@redhat.com>
Signed-off-by: David Zafman <dzafman@redhat.com>
Copy link
Contributor

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about this... @dzafman, are you able to confirm if it's only mgr balancer commands that are affected, or do all mgr commands block (try ceph osd status for example). If the latter is true, we're hitting a bigger problem (see https://tracker.ceph.com/issues/37514)

@mamahtehok
Copy link

I'm not quite sure about this... @dzafman, are you able to confirm if it's only mgr balancer commands that are affected, or do all mgr commands block (try ceph osd status for example). If the latter is true, we're hitting a bigger problem (see https://tracker.ceph.com/issues/37514)

Hi, in my case, long balancer task blocked ceph osd status command also.

@dzafman
Copy link
Contributor Author

dzafman commented Oct 29, 2019

@tserong @mamahtehok This fix only deals with the active balancer background thread. In that case a slow OSDMap::calc_pg_upmaps() will NOT block other commands. So this pull request would fix using ceph osd status with an active balancer.

If a user uses the ceph balancer optimize command it would block other commands if it is slow per https://tracker.ceph.com/issues/37514.

@tserong
Copy link
Contributor

tserong commented Oct 30, 2019

Thanks for the explanation. As for the placement of PyEval_SaveThread() / PyEval_RestoreThread(), I think what you have makes sense; it drops the GIL for a potential long-running operation which doesn't need access to Python stuff while it's running, then reacquires it later. The only possible catch I can think of is that if some other balancer command were invoked while calc_pg_upmaps was running (without the GIL), the PyEval_RestoreThread() would block until that command finished, so this change effectively means interactive tasks take precedence over background tasks.

@dzafman dzafman requested a review from tchaikov November 1, 2019 21:04
liewegas added a commit that referenced this pull request Nov 7, 2019
* refs/pull/31064/head:
	test: Test balancer module commands
	mgr: Improve balancer module status
	mgr: Release GIL before calling OSDMap::calc_pg_upmaps()

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
@liewegas liewegas merged commit 3a0e2c8 into ceph:master Nov 7, 2019
@dzafman dzafman deleted the wip-42432 branch November 8, 2019 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants