Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions during Python module cleanup #106

Closed
JoshRosen opened this issue Aug 8, 2012 · 21 comments
Closed

Exceptions during Python module cleanup #106

JoshRosen opened this issue Aug 8, 2012 · 21 comments
Assignees
Milestone

Comments

@JoshRosen
Copy link
Contributor

After refactoring some code in an application that uses Py4J (simply splitting the code that interacts with Py4J into a separate file), I noticed exceptions like

Exception TypeError: "'NoneType' object is not callable" in <function <lambda> at 0x10213e5f0> ignored

when my application shut down.

Here's a small test case that reproduces the problem with Python 2.7.1 (on my machine, at least):

In test.py:

from py4j.java_gateway import JavaGateway
gateway = JavaGateway.launch_gateway(die_on_exit=True)

Running python test.py produces no errors. In test2.py, I simply import test:

import test

Running python test2.py results in

Exception TypeError: 'isinstance() arg 2 must be a class, type, or tuple of classes and types' in <function <lambda> at 0x1069f1758> ignored

The cause appears to be related to the order in which modules are cleaned up when Python shuts down. When running test.py, the JavaGateway instance (in __main__) is cleaned up before any of the py4j modules are cleaned up:

$ python -v test.py
...
# cleanup __main__
# cleanup[1] py4j
...
# cleanup[1] py4j.protocol

Compare this to running test2.py, which imports test:

$ python -v test2.py
...
# cleanup __main__
# cleanup[1] py4j
...
# cleanup[1] py4j.finalizer
...
# cleanup[1] py4j.java_collections
...
# cleanup[1] py4j.protocol
...
# cleanup[1] test
...
# cleanup[1] py4j.java_gateway
# cleanup[1] py4j.compat
...

In this case, py4j modules are cleaned up before the test module and the JavaGateway instance are cleaned up.

The cause of this particular TypeError exception isn't immediately clear because of the lambda function. After renaming the lambda functions in Py4J, it appears that it refers to the weakref.ref callback in JavaObject. I suspect that this problem may be due to the weakref callback calling functions from modules that may have already been cleaned up. It looks like this specific 'isinstance() arg 2 must be a class, type, or tuple of classes and types' exception might be occurring in a isinstance(s, unicode) call in smart_decode, which is odd.

Interestingly, adding an extra, unused py4j import in test.py fixes the exception and causes all of the py4j modules to be cleaned up after the test module and JavaGateway instance.

I'm not sure if this is a serious issue:

  • It requires a fairly specific pattern of imports and module dependencies to reproduce.
  • It only occurs during Python shutdown.
  • There's an easy (if non-obvious) workaround.
@ghost ghost assigned bartdag Aug 8, 2012
@bartdag
Copy link
Collaborator

bartdag commented Aug 8, 2012

Very interesting bug. I'll keep it open, but might not fix it for the next release because as you say, it does not occur often and there is a workaround. Thanks!

@bartdag
Copy link
Collaborator

bartdag commented Dec 23, 2013

I may have time to look at this bug during the holidays. Is it possible to include the source code of test.py and test2.py?

@JoshRosen
Copy link
Contributor Author

I think that the issue above contains the full source of test.py and test2.py:

test.py:

from py4j.java_gateway import JavaGateway
gateway = JavaGateway.launch_gateway(die_on_exit=True)

test2.py:

import test

@bartdag
Copy link
Collaborator

bartdag commented Dec 26, 2013

Sorry, I missed that part. Thanks!

@nchammas
Copy link

Just curious: Is this still an issue?

@bartdag
Copy link
Collaborator

bartdag commented Dec 17, 2014

I never found the time to look more seriously into it so the problem might still exist as of today.

@behdad84
Copy link

behdad84 commented Jun 5, 2015

Just curious: Is this still an issue?

@lessthanoptimal
Copy link

I'm still getting it. Might be related to enums. Just references a few more and the number of messages on clean up increased.

@lessthanoptimal
Copy link

Would like to add that the proposed workaround above of importing py4j doesn't seem to be working on my system. This issue I would say is a nuisance right now because it is spewing out about 70% of a terminal of text. Hiding useful information from a user because it scrolls away.

@bartdag
Copy link
Collaborator

bartdag commented Dec 28, 2015

@lessthanoptimal I was not able to replicate the problem with the code in master. Can you either create another reproducible example or test the following:

wrap the code in the methods py4j.java_gateway._garbage_collect_object and py4j.java_gateway._garbage_collect_connection in a try/except block. For example:

def _garbage_collect_object(gateway_client, target_id):
    try:
        ThreadSafeFinalizer.remove_finalizer(
            smart_decode(gateway_client.address) +
            smart_decode(gateway_client.port) +
            target_id)
        if target_id != proto.ENTRY_POINT_OBJECT_ID and\
                target_id != proto.GATEWAY_SERVER_OBJECT_ID and\
                gateway_client.is_connected:
            try:
                gateway_client.send_command(
                    proto.MEMORY_COMMAND_NAME +
                    proto.MEMORY_DEL_SUBCOMMAND_NAME +
                    target_id +
                    "\ne\n")
            except Exception:
                logger.debug("Exception while garbage collecting an object",
                             exc_info=True)
    except Exception:
        logger.debug("Exception while garbage collecting an object",
                     exc_info=True)

Thanks!

@lessthanoptimal
Copy link

Here's another example.

Output:

Finished
Exception TypeError: "'NoneType' object is not callable" in <function <lambda> at 0x7fdafccf4f50> ignored
Exception TypeError: "'NoneType' object is not callable" in <function <lambda> at 0x7fdafafec410> ignored

Source Code

EntryPoint.java

public class EntryPoint {

    public static void invoke() {
        System.out.println("invoked");
    }

    public static void main(String[] args) {
        GatewayServer gatewayServer = new GatewayServer(new EntryPoint());
        gatewayServer.start();
        System.out.println("Gateway Server Started");
    }
}

experiment/init.py

from py4j.java_gateway import JavaGateway

gateway = JavaGateway()

from another import *

experiment/another.py

from experiment import gateway

sample.y

import experiment as ex

ex.gateway.jvm.EntryPoint.invoke()

print "Finished"

experiment.zip

@jvstein
Copy link

jvstein commented Feb 25, 2016

I was getting a lot of the same errors as above on application shutdown.

Exception TypeError: "'NoneType' object is not callable" in <function <lambda> at 0x7f3a20640f50> ignored

I tracked my issue down to the weakref created in the JavaObject. There's something problematic with the lambda callback (maybe the default parameters that reference fields on self). Moving the weakref creation to a non-lambda callback that is passed to register_output_converter seems to resolve the issue.

I was forcing a conversion of my java objects to python objects in a facade (e.g. list(java_list)). Calling java_list._detach() after I was done cleaned up the garbage collection errors.

@bartdag
Copy link
Collaborator

bartdag commented Mar 7, 2016

@lessthanoptimal thanks a lot, I can now reproduce the errors with your code!

I am trying several strategies, but so far, they all have drawbacks. With your code example though, I'm sure I'll be able to fix it soon though.

bartdag added a commit that referenced this issue Mar 7, 2016
@bartdag
Copy link
Collaborator

bartdag commented Mar 7, 2016

I believe I fixed the main symptom with the last commit (the weak ref callback is trying to call a function that no longer exist), but not the underlying problem: the weak ref callback is called after the py4j.java_gateway module is removed from memory. It means that if the interpreter exits and the JavaGateway was not properly shut down, the Java side will never receive the garbage collection commands, potentially creating a leak.

I don't think it is a major issue because Py4J cannot handle every possible crash scenario, but because this is a normal exit, Py4J should try harder (maybe with the atexit facility).

@bartdag
Copy link
Collaborator

bartdag commented Mar 7, 2016

My bad, after further debugging, I found that java objects are correctly garbage collected (the weak ref callback is called before the java_gateway.py module destruction).

If the 1e010c5 commit fixes the issue for everybody, I'll close this issue for the 0.10 release.

@bartdag bartdag modified the milestones: 0.10, Future Mar 7, 2016
@lessthanoptimal
Copy link

Just ran the code I posted above with 1e010c5 and got the same exceptions as before. No change as far as I can tell.

@bartdag
Copy link
Collaborator

bartdag commented Mar 9, 2016

Hi, on which OS are you testing? I just re-tested on windows 7, ubuntu 14.04, and Mac OSX and I see the errors when I delete the guard conditions (e.g., _garbage_collect_connection and) but I no longer see any error when I put back the conditions :-(

@lessthanoptimal
Copy link

Mint Linux 17.2, which is very similar to Ubuntu 14.04.

I performed the test by checking out the latest code (which was 1e010c5), installed the jar, and run the example without any modification. I was careful to make sure there was no stale py4j jars laying around, but I can perform the test again by referencing the jar directly.

@bartdag
Copy link
Collaborator

bartdag commented Mar 9, 2016

Very strange :-( I don't think the jar matters here because the problem is 100% located in the Python code. I would just make sure that the PYTHONPATH points to the up to date python code, but you probably already checked that so I'm at lost as to what is causing this difference in behavior.

@lessthanoptimal
Copy link

I had forgotten to make sure it was referencing the latest python code! It's now working as advertised. Running it on a different machine than before, but it's also Mint 17.2.

As a side note, is there instructions on how to build it from source? I figured it out, but it was a little bit of a manual process and my first guesses at which files run didn't work.

@bartdag
Copy link
Collaborator

bartdag commented Mar 9, 2016

@lessthanoptimal I'm happy it's working!

I should definitively write more detailed build instructions. There are some in the contributing doc page, but I really need to work on the Java side more (move from ant to gradle, code formatting, etc.).

bartdag added a commit that referenced this issue Mar 12, 2016
@bartdag bartdag modified the milestones: 0.9.2, 0.10 Mar 12, 2016
@bartdag bartdag closed this as completed Mar 12, 2016
bartdag added a commit that referenced this issue May 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants