Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up globals defined in Python source, make sure their __del__ is called #988

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jameslan
Copy link
Contributor

Fix #985

@slozier
Copy link
Contributor

slozier commented Oct 16, 2020

The GC.Collect on Shutdown seems OK, however the cleanup loop seems really icky. What happens if you have a __del__ method that relies on a global and it gets deleted first? Then the __del__ method would throw. A (maybe?) reasonable example where this would occur would be:

_Foo_count = 0

class Foo:
    def __init__(self, name):
        global _Foo_count
        _Foo_count += 1
        self.name = name
        print("Creating instance of Foo (Total: {})".format(_Foo_count))

    def __del__(self):
        global _Foo_count
        _Foo_count -= 1
        print("Deleting i instance of Foo (Total: {})".format(_Foo_count))

It may be unfortunate, but in the particular case of resource cleanup, I think the burden should fall on the Python developer. For example, if someone opens up a file and wants to make sure it gets flushed to disk, then they should use a with statement or call close themselves (I guess that would apply to all io objects). Similarly, if they really want the __del__ to get invoked, then they should at a minimum delete their variable.

This would mean that in #985 the user would be expected to del foo. Similarly for the io examples in #938. However, the sys.stdout.write example in #938 is a perfectly reasonable expectation and we should definitely support that one.

It looks like .NET Framework does a GC collect on app shutdown, but .NET Core does not so unfortunately this means these sorts of problems may be more frequent on .NET Core. I'm not sure if it's possible to mimic the .NET Framework behavior with .NET Core?

I would be happy to be convinced otherwise... @BCSharp any thoughts on the subject?

@jameslan jameslan marked this pull request as draft October 17, 2020 06:19
@jameslan
Copy link
Contributor Author

Yes this is really an issue and a better solution is needed.

BTW, It's probably not just missing a GC at the process exit for .NET Core. I tried renaming the Main to CmdExec in PythonConsoleHost and add

public static int Main(string[] args) {int code = CmdExec(args);

    GC.Collect();

    GC.WaitForPendingFinalizers();return code;}

so that local variable shouldn't be the reason for the object not being collected. But the __del__ of the global var is still not called.

It seems there's a more complicated difference between .NET Framework and .NET Core.

@slozier
Copy link
Contributor

slozier commented Oct 17, 2020

From https://docs.microsoft.com/en-us/dotnet/api/system.object.finalize#how-finalization-works.

The garbage collector then calls the Finalize method automatically under the following conditions:
On .NET Framework only, during shutdown of an application domain, unless the object is exempt from finalization. During shutdown, even objects that are still accessible are finalized.

@BCSharp
Copy link
Member

BCSharp commented Oct 17, 2020

I wrote about finalizers before, so here is just the bottom line: every time I see that finalizers are used as a mechanism for anything else than preventing unmanaged resource leak, I get a sense that something is icky.

In a way I am not surprised that .NET Core turns out not to run all existing finalizers on shutdown. Since their job is to release unmanaged resources, on process termination the OS will do all that cleanup anyway. And since there is no AppDomain support, no wonder the finalizers are being skipped.

IronPython uses the .NET garbage collection mechanism, which is different than CPython's, so there are bound to be differences. Since the specific behaviour of GC is not part of the language, proper Python code should not be written in a way that it depends on it. In practice, however, it does. To quote @slozier:

While it's bad code the fact that it works with CPython means it's more than likely used in the wild...

So another thing that works with CPython is that __del__ methods, if present, will always be called on every instantiated object, even during shutdown. That is a strong case to try to support it here as well. Interestingly, however, CPython does not guarantee that on shutdown, the __del__ method on one object, when accessing another object, will be run before the accessed object is __del__-ed, although it does its best to follow this rule. Consider this:

class A:
    def __init__(self, name, ref):
        self.name = name
        self.ref = ref
        self.del_called = False

    def __del__(self):
        print("Deleting {}, is del already called? {}; referring to {}, is that object already deleted? {}".format(
            self.name, self.del_called, self.ref.name, self.ref.del_called))
        self.del_called = True

a1 = A("a1", None)
a2 = A("a2", a1)
a1.ref = a2
Deleting a1, is del already called? False; referring to a2, is that object already deleted? False
Deleting a2, is del already called? False; referring to a1, is that object already deleted? True

That gave me an idea that builds on the cleanup solution of @jameslan: what if instead actually deleting variables from the module namespace, on shutdown we simply try-invoke __del__ on each of them? This may not be as good as CPython in terms of ordering the calls, but going depth-first may be a reasonable approximation. And of course we would need precautions that __del__ is invoked only once per instance.

Tests/test_regressions.py Outdated Show resolved Hide resolved
@jameslan
Copy link
Contributor Author

jameslan commented Oct 18, 2020

@BCSharp The example you designed is a circular reference and there's no ideal order of calling. But sometimes it needs the correct order:

log = open('log.txt', 'w')

class Foo:
    def __del__(self):
        log.write('__del__\n')

foo = Foo()

After calling __del__ of a FileIO object, the file is closed and can't write anymore. So log should be del after foo.

I'm wondering whether or how .NET Framework figuring out the order, and how CPython is handling this.

@slozier
Copy link
Contributor

slozier commented Oct 18, 2020

Here's a thought I had this morning. Instead of trying to delete the world, would it not be enough for practical cases to release references to the __main__ module (e.g. remove it from sys.modules) and letting the GC do it's thing?

@BCSharp
Copy link
Member

BCSharp commented Oct 18, 2020

example you designed is a circular reference and there's no ideal order of calling

Yes, I was curious what CPython would do in such "impossible" scenario. It had basically two choices:

  1. Skip the second __del__ call to prevent a deleted object from being accessed.
  2. Allow access to the deleted object to ensure that all __del__ methods are being called.

If CPython chose 1, I would agree with @slozier that we just let it be as is in IronPython, and put the burden on the developer to explicitly del anything that they want __del__ to be called on. But with choive 2 I argue that it is more in line with CPython to try to make sure than all __del__ methots are being called, even if some may result in an exception (which we will have to catch unconditionally to ensure that subsequent __del__ methods are being called).

I agree it is not perfect so this is just a heuristic to make the most common scenarios to work. To handle the example for a file object, we could add a rule that objects of types defined lower in module dependency are deleted later than objects of types defined higher up.

To make it work perfectly (excluding circular references), we would need to know and trace all existing references and partial-order them. I believe this is what CPython does. But we don't have such information, although obviously the .NET GC does.

would it not be enough for practical cases to release references to the __main__ module (...) and letting the GC do it's thing?

I think it is worth investigating. It may be better, or it may be worse. The GC has all the information of referential dependencies, so it is possible for it to do partial ordering and finalize objects in a proper sequence like CPython does. But I have never read any explicit guarantees or even hints that this is what the GC does. On the contrary, I remember warnings against accessing any references an object may hold to other managed objects, from within its finalizer, because they may be gone already. It doesn't mean that the GC doesn't do its best, like CPython. It may still do so, and the exceptions only occur for circular references. If so, we are lucky. But it is also possible that the GC assumes the finalizers only dispose unmanaged resources and totally disregards any ordering, for performance reasons. Bad luck then, even worse than with our heuristics. It is also possible that .NET Framework does orderly finalization and .NET Core not. Or that it changes in the future.

From https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/language-specification/basic-concepts#automatic-memory-management

Like other languages that assume the existence of a garbage collector, C# is designed so that the garbage collector may implement a wide range of memory management policies. For instance, C# does not require that destructors be run or that objects be collected as soon as they are eligible, or that destructors be run in any particular order, or on any particular thread.

Idem:

Under normal circumstances the destructor for the object is run once only, though implementation-specific APIs may allow this behavior to be overridden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__del__ method is not called
3 participants