Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rados.py: remove Rados.__del__(); it just causes problems #3119

Merged
2 commits merged into from Dec 10, 2014
Merged

Conversation

dmick
Copy link
Member

@dmick dmick commented Dec 9, 2014

Recent versions of Python contain a change to thread shutdown that
causes ceph to hang on exit; see http://bugs.python.org/issue21963.
As it turns out, this is relatively easy to avoid by not spawning
threads on exit, as Rados.del() will certainly do by calling
shutdown(); I suspect, but haven't proven, that the problem is
that shutdown() tries to start() a threading.Thread() that never
makes it all the way back to signal start().

Fixes: #8767
Signed-off-by: Dan Mick dan.mick@redhat.com

@loic-bot
Copy link

loic-bot commented Dec 9, 2014

SUCCESS: make check http://paste.ubuntu.com/9437045/

:octocat: Sent from GH.

@ghost ghost added bug-fix core labels Dec 9, 2014
@ghost ghost self-assigned this Dec 9, 2014
@ghost
Copy link

ghost commented Dec 9, 2014

s/8767/8797/ in the commit message ;-)

@ghost
Copy link

ghost commented Dec 9, 2014

It looks like this will leak whatever resource shutdown will free. Would it be possible to

            self.librados.rados_shutdown(self.cluster)

instead of

            run_in_thread(self.librados.rados_shutdown, (self.cluster,))

in http://workbench.dachary.org/ceph/ceph/blob/giant/src/pybind/rados.py#L251 ?

@dmick
Copy link
Member Author

dmick commented Dec 10, 2014

It would be possible, but if anyone calls shutdown() explicitly they will block, unlike any other rados call. And, yes, it means that users who do not call shutdown() explicitly will leak resources. This is predicated on the idea that there are few users who 1) use more than one cluster connection from a long-running process, AND 2) don't call shutdown on them, ever. I think the number of such users is close to zero.

@dmick
Copy link
Member Author

dmick commented Dec 10, 2014

Forcepushed commit msg fix. Leaving open for discussion of Loic's point.

@loic-bot
Copy link

SUCCESS: make check on b74e16c output is http://paste.ubuntu.com/9450979/

:octocat: Sent from GH.

@ghost
Copy link

ghost commented Dec 10, 2014

@dmick I'm not familiar with this code base (disclaimer ;-). My suggestion above is because reading http://workbench.dachary.org/ceph/ceph/blob/giant/src/pybind/rados.py#L160 leads me to think that all calls to run_in_thread are synchronous and the thread is used to workaround the fact that timeouts were not implemented in the past. Am I misinterpreting this piece of code ?

@dmick
Copy link
Member Author

dmick commented Dec 10, 2014

run_in_thread is there so that the main Python thread can receive the SIGINT (or other) signals; for all rados operations there is a potential to block, so we deal with that by running them in child threads, and letting the main Python thread still be scheduled so that it can process signals.

Let's chat about this in IRC if you're still awake.

@dmick
Copy link
Member Author

dmick commented Dec 10, 2014

Resolved: document the new requirement, add a release note, and make ceph.in explicitly call shutdown() as a model citizen

Recent versions of Python contain a change to thread shutdown that
causes ceph to hang on exit; see http://bugs.python.org/issue21963.
As it turns out, this is relatively easy to avoid by not spawning
threads on exit, as Rados.__del__() will certainly do by calling
shutdown(); I suspect, but haven't proven, that the problem is
that shutdown() tries to start() a threading.Thread() that never
makes it all the way back to signal start().

Also add a PendingReleaseNote and extra doc comments to clarify.

Fixes: #8797
Signed-off-by: Dan Mick <dan.mick@redhat.com>
@dmick
Copy link
Member Author

dmick commented Dec 10, 2014

Forcepushed according to suggestions above @dachary @jdurgin

@loic-bot
Copy link

FAIL: make check on 20e3fb9 output is http://paste.ubuntu.com/9467699/

:octocat: Sent from GH.

@ghost
Copy link

ghost commented Dec 10, 2014

*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
{}
Traceback (most recent call last):
  File "./ceph", line 885, in 
    cluster_handle.shutdown()
AttributeError: 'NoneType' object has no attribute 'shutdown'
run: 33: return 1
main: 120: code=1

@jdurgin
Copy link
Member

jdurgin commented Dec 10, 2014

Other than the bug pointed out by the make check error, looks good to me

@dmick
Copy link
Member Author

dmick commented Dec 10, 2014

ah, right, cluster_handle might not be set up. duh.

This is mostly a demonstration of good behavior, as the resources will
be reclaimed on exit anyway.

Signed-off-by: Dan Mick <dan.mick@redhat.com>
@loic-bot
Copy link

SUCCESS: make check on b038e8f output is http://paste.ubuntu.com/9468494/

:octocat: Sent from GH.

ghost pushed a commit that referenced this pull request Dec 10, 2014
rados.py: remove Rados.__del__(); it just causes problems

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Loic Dachary <ldachary@redhat.com>
@ghost ghost merged commit d3f2ec3 into master Dec 10, 2014
@dmick dmick deleted the wip-8797 branch March 9, 2015 15:09
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants