stop using finalizers for resource management? #11207

JeffBezanson · 2015-05-09T02:39:38Z

Finalizers are inefficient and unpredictable. And with the new GC, it might take much longer to get around to freeing an object, therefore tying up its resources longer. Ideally releasing external resources should not be tied to how memory management works.

We are already not far from this with the open(f) do construct. I think that and/or with should be used. Perhaps there could be some other mechanism for registering files to close eventually.

Discussed this with @carnaval .

The text was updated successfully, but these errors were encountered:

ScottPJones · 2015-05-09T02:48:20Z

Hmmm... I was just going to start using finalizers (but still had some questions about them to investigate).
My needs are simple: 1 pointer in Julia to a structure allocated / controlled by C, which also contains a pointer possibly allocated by my C code, or else allocated by DBMS allocator. If the object goes out of scope in Julia, and is about to be GCed, I thought the finalizer would allow me to call my C release code... It concerned me though that finalizers are apparently associated with the object, not the type.

quinnj · 2015-05-09T02:52:38Z

Go has the defer keyword, the usage is:

f := os.Open(file)
defer f.Close()

which pushes f.Close() onto a stack of function calls that get evaluated
when the enclosing scope ends.

http://blog.golang.org/defer-panic-and-recover

-Jacob

On Fri, May 8, 2015 at 8:48 PM, Scott P. Jones notifications@github.com
wrote:

Hmmm... I was just going to start using finalizers (but still had some
questions about them to investigate).
My needs are simple: 1 pointer in Julia to a structure allocated /
controlled by C, which also contains a pointer possibly allocated by my C
code, or else allocated by DBMS allocator. If the object goes out of scope
in Julia, and is about to be GCed, I thought the finalizer would allow me
to call my C release code... It concerned me though that finalizers are
apparently associated with the object, not the type.

—
Reply to this email directly or view it on GitHub
#11207 (comment).

timholy · 2015-05-09T02:55:41Z

If one is rethinking this, the machinations of CUDArt to manage GPU memory in a GC-compatible way are probably amusing fodder for thought. The arrival of finalize was a huge step forward.

NOTE: 2nd link updated to correct target.

carlobaldassi · 2015-05-09T03:18:53Z

For reference, I'll also add the case of GLPK, which bears similarities with CUDArt, see e.g. here and here.

JeffBezanson · 2015-05-09T03:23:05Z

See also #1037

It would be great to get rid of finalizers entirely, but that's probably not realistic. For starters, I would still allow finalizers but not use them to close files and such.

@ScottPJones you can definitely use finalizers to call your C release code.

Finalizers can be associated with a type by adding them to all instances in the constructor :)
Seriously though, I'm not sure how it would work to associate finalizers with types. For example, you can't iterate over all dead objects to see which ones might have finalizers. And how is the right type identified? That's usually done through method calls, but who calls what function, and when, to determine what to finalize? The simplest thing is to just give the GC a list of objects with finalizers attached.

elextr · 2015-05-09T03:37:57Z

@JeffBezanson it is very useful to have a mechanism to allow freeing of limited resources (like file descriptors) as soon as reasonably possible. As you say finalizers will eventually get around to it, but that doesn't prevent exhaustion in the meantime.

One question, are finalizers always run, no matter how the program exits, so its always possible to be sure any resource does not remain locked?

ScottPJones · 2015-05-09T03:48:46Z

@elextr yes - that sort of exhaustion has been a big issue with the sort of code that I write, where it has to stay running with minimal downtime for years...

elextr · 2015-05-09T07:26:51Z

@ScottPJones then its probably best if you do your resource management explicitly yourself, certainly don't rely on anything in the semantics of any language, unless specified and guaranteed.

Specifically the semantics of the Julia GC and hence finalizers is not guaranteed, it currently happens to have recently changed to a generational GC in 0.4, but is not generational in 0.3, and that may change in 0.4/0.5 again when threading lands (for example). All you can know about a finalizer is that the object it relates to is no longer in use when the finalizer is run, but my reading of this suggests that it may not be run for bitstypes, hence my question above.

aviks · 2015-05-09T10:19:21Z

Another use case is the interaction of the Java and Julia GC's in JavaCall. Objects retrieved from Java into Julia need to explicitly de-referenced in Java when they are no longer used within Julia. This is achieved via the finalizers. Which works fine, except that the Java VM can have greater memory pressure than the Julia VM. In that case, the JVM can run out of memory, before Julia decides that the GC needs to be run.

timholy · 2015-05-09T10:22:11Z

@ScottPJones, I hear you. In several places like HDF5 and CUDArt, the key was to write code like

open(filename) do file
    # do stuff
end

which guarantees that file will be closed (immediately upon completion) even if stuff has an error in it. That construct currently has some performance overhead (anonymous functions), but in most cases is worth it. You can manually use try...finally in cases where you can't tolerate the overhead.

ScottPJones · 2015-05-09T10:40:41Z

@elextr I should have been clearer... I'm not planning on relying on the finalizers at all.
My APIs are identical (with some name changes, can't have ! in C function names), in C11, C++11,
Python, Java, and Julia...
I just want to prevent memory leaks, esp. when people are playing around / prototyping stuff in the REPL.
For example, something like the following:

myObj = DA.PackedData(1000) # creates a packed data buffer with initial size at least 1000 bytes.
push!(myObj, "Encode a string")
push!(myObj, 5.2332) # encode an IEEE binary floating point number
save!(myDBMS, myObj) # write packed record out as a row
release(myObj) # Release the underlying C buffer object, 0 out the pointer in the Julia myObj object...

What happened many times, in the REPL, is I accidentally set myObj to something else before calling release... so I lost memory each time...
Using the finalizer is just to catch stupid things like that...

@timholy That's good to know, but is that sort of syntax only for files? (sorry, my newbieness with Julia is showing again!)

timholy · 2015-05-09T12:08:53Z

@ScottPJones, it's a standard julia convention, see http://docs.julialang.org/en/latest/manual/functions/#do-block-syntax-for-function-arguments. You have to write a version of your function that takes another function as the first argument (see, e.g., the methods defined for open). Internally, it's just try...finally.

tknopp · 2015-05-09T12:09:22Z

@ScottPJones No the do syntax is not restricted to files see http://julia.readthedocs.org/en/latest/manual/functions/#do-block-syntax-for-function-arguments

I think this is the standard way to do it in Julia and as @timholy said it is used in various places in Julia land. In Gtk.jl we have also some places.

Where the finalizers are important is when the type goes out of scope. We have for instance in Gtk.jl the situation where it is really needed.

tknopp · 2015-05-09T12:10:01Z

oh Tim is faster, sorry.

ScottPJones · 2015-05-09T12:50:34Z

Thanks @tknopp & @timholy! Sorry for the noise, I really am trying to memorize the manual, but Julia is such a large language!

timholy · 2015-05-09T14:59:16Z

It definitely takes a while, no apologies needed.

tknopp · 2015-05-09T18:53:38Z

@JeffBezanson: What is the actual proposal of this issue? Isn't the do syntax already consistently been used for files? I think the finalizers are useful when the scope is not local.

carnaval · 2015-05-09T19:05:47Z

We probably can't remove finalizers alltogether because then we would be leaking resources. I think this issue is more about conventions on "good practice for resource management" since the biggest problem (besides performance) is that the gc is very lazy : it only works under pressure, that is memory pressure. It has no way to know e.g. how many file descriptor are open by the program, so if your handle object is small, the gc will be completely fine keeping it around for a long time while you exhaust your open fd limit.

I don't have any good idea about this by the way...

wildart · 2015-05-09T19:14:58Z

I found finalizers unreliable. When interfacing with C code, I would really prefer something like Go defer rather then use finalizer to release resources. I opt to a manual resource management event though it increase several times amount of code to be written.

JeffBezanson · 2015-05-09T19:22:29Z

@tknopp good question. My proposal would be

Have a standard "release resources" function, maybe close, maybe finalize, but the same for every type with this issue.
Use with instead of type-specific functions like open for this.
Document that this should be used instead of relying on finalizers, and use it ourselves everywhere we can.
Make a concerted effort to remove use of finalizers, e.g. for BigInt
Don't add finalizers to I/O-related objects by default. Instead maybe the REPL could add finalizers for interactive objects, to call the standard close function.

The last item sounds drastic, but as it is finalizers might not be invoked for a very long time, and unpredictably. You could still use finalizers as an escape hatch. If you're not sure how to handle releasing some object, you can just call finalizer(x) on it any time.

tknopp · 2015-05-09T19:51:04Z

Ok. Is there some issue what with is and where it differs from the do syntax?

carlobaldassi · 2015-05-09T19:58:28Z

I'll just add another small issue about using finalizers with IO objects which I very recently discovered: on Windows, trying to call rm on a file with an open descriptor fails. This made the FastaIO tests fail, because I was relying on finalize to close the file after I finished reading it, and I was deleting it after the tests. I never noticed the bug since on Linux that works fine.
So this is probably not a very common situation, but — in association with the unpredictability of the GC — may lead to OS-specific, non-deterministic bugs.

elextr · 2015-05-09T20:09:38Z

@JeffBezanson how do you propose to handle objects whose lifetime exceeds the scope of the with, eg ones returned from the function?

JeffBezanson · 2015-05-09T20:14:37Z

If an object lifetime exceeds the local scope, you can't use with. The only options I see in that case are (1) somebody downstream uses with, (2) you add a finalizer to the object before returning it.

StefanKarpinski · 2015-05-10T18:28:32Z

Another idea is to have some types opt into reference counting and finalize them when their counts get to zero. It's not entirely clear to me how to make a mix of refcounting and not work, however.

jakebolewski · 2015-05-10T18:32:06Z

watch out, may be flayed by mentioning reference counting :-)

carnaval · 2015-05-10T19:11:12Z

the problem with mixed refcount is that a refcounted object can still be kept alive by a non refcounted one (worst case : the object keeping it alive is in oldgen). Then you don't get the "immediate finalization" property.

carnaval · 2015-05-10T19:18:47Z

To alleviate the late finalization problem we could also teach the gc about other kind of resources so that it can be taken into account in the collection heuristics. So e.g. you could register a "file descriptor", or "GPU memory" something, and then explicitely say : I allocated X of this, running this finalizer will get me Y of this back.

Painful to implement though. And it can only make gc overhead worse (by collecting more often).

StefanKarpinski · 2015-05-10T19:31:55Z

the problem with mixed refcount is that a refcounted object can still be kept alive by a non refcounted one (worst case : the object keeping it alive is in oldgen). Then you don't get the "immediate finalization" property.

Yes, in such a scheme every reference that could transitively reach anything refcounted would need to maintain a refcount. That includes most abstract slots, and slots in data structures that can refer to refcounted objects. But that still excludes most things we care about the performance of.

ScottPJones · 2015-06-03T23:50:11Z

👍 to @quinnj 's proposal, it has what I had been asking for, the only thing is that I'd suggest calling the name finalize!, because it definitely modifies the object, and to avoid confusion with the old finalize method.

aviks · 2015-06-04T07:55:14Z

Ah, ok, thanks... I misunderstood.

Yes, it should be sufficient to have finalisers associated with types. Currently, every object gets the same finaliser function. Of course, the type parameters and fields will need to be available to the finaliser.

quinnj · 2015-06-04T17:12:52Z

I wonder if the mmap and WeakKeyDict cases call for something like

finalize(a) do f
   # code to finalize `a` which is a type not declared with a `finalize` method
end

This wouldn't actually do the finalizing, just "move" a to the finalize pool of objects and the function argument would be run as the finalize method whenever that happens, either manually, from a with block, or when the object was destroyed.

Not sure how feasible "moving an object to the finalization pool of objects" would be though....

amitmurthy · 2015-06-05T04:39:58Z

Is #10960 then an artifact of the new gc? That could explain memory leaks with shared and distributed arrays. An ability to explicitly "free" remote objects will be quite useful, especially in cases where people are using distributed arrays across multiple hosts specifically to leverage every bit of memory available.

ScottPJones · 2015-06-16T17:35:12Z

@quinnj Carrying the discussion from #11280 over here, as requested...
You said:

the problem with being able to call your own finalize! is you then need someway to tell if an object has been finalized or not.

That's precisely what I said I'd done, I have a pointer to something that needs to be finalized, so I simply set it to zero (C_NULL) in finalize!. If you don't have a pointer, a flag can be used. It is an extra check on each reference, but you stop having segfaults or problems with things outside of Julia being released.
It was the only way I could think of currently to make sure something can be finalized quickly most of the time, and still prevent memory (or other resource) leakage when things get GCed.
Do you have any better suggestions to handle that?

amitmurthy · 2015-06-23T19:41:08Z

Just noticed this when there are multiple finalizers defined for an object.

julia> type Foo
           v
       end

julia> f=Foo(0)
Foo(0)

julia> Foo(0)
Foo(0)

julia> for i in 1:10
           finalizer(f, x-> @schedule print("FINALIZED $i \n") )
       end

julia> f=nothing

julia> for i in 1:10
           print("calling gc for the $i th time\n");
           gc()
       end
calling gc for the 1 th time
calling gc for the 2 th time
FINALIZED 10 
FINALIZED 9 
calling gc for the 3 th time
FINALIZED 1 
FINALIZED 2 
calling gc for the 4 th time
FINALIZED 8 
calling gc for the 5 th time
FINALIZED 7 
calling gc for the 6 th time
FINALIZED 6 
calling gc for the 7 th time
FINALIZED 5 
calling gc for the 8 th time
FINALIZED 4 
calling gc for the 9 th time
FINALIZED 3 
calling gc for the 10 th time

Found it a little odd that all the finalizers are not executed together at the first gc itself.

ScottPJones · 2015-06-23T20:28:11Z

This wouldn't happen, if @timholy's idea (seconded by @quinnj [and myself]) to use a tag bit to say whether the finalizer had been run for an object.
(or I guess that is a different object each time... never mind!)

yuyichao · 2015-06-23T22:47:53Z

@amitmurthy This is somewhat related to the (sub-)issue I noticed in #11814 (comment) . My guess is that running too many finalizers at the same time will cause a too long pulse but @carnaval should know for sure.

yuyichao · 2015-11-15T02:07:53Z

FWIW, the issue above in #11207 (comment) is solved by #13995 .

amitmurthy · 2015-11-15T02:17:33Z

Cool. And regarding the topic of this issue - It is not just about files, I don't think we have a choice but to use finalizers for remote references. We can document that users can manually call finalize for better control on when remote resources get released, else it will only happen when gc eventually gets around to it.

stevengj · 2021-02-16T16:21:16Z

Isn't this issue essentially a duplicate of #7721?

JeffBezanson added the kind:speculative Whether the change will be implemented is speculative label May 9, 2015

yuyichao mentioned this issue Jun 16, 2015

conservative stack scanning? #11714

Closed

quinnj mentioned this issue Jun 16, 2015

RFC: mmap makeover #11280

Merged

quinnj mentioned this issue Oct 26, 2015

Jq/datastreams2 JuliaDatabases/SQLite.jl#80

Merged

quinnj mentioned this issue Dec 9, 2015

Define close method? BioJulia/Libz.jl#5

Closed

JeffBezanson mentioned this issue Jan 12, 2016

with for deterministic destruction #7721

Open

stevengj mentioned this issue Aug 11, 2016

Add an optional "at stop" method to the iterator interface #17954

Closed

StefanKarpinski added this to the 1.0 milestone Aug 19, 2016

JeffBezanson modified the milestones: 2.0+, 1.0 May 2, 2017

JeffBezanson added the design Design of APIs or of the language itself label May 2, 2017

davidanthoff mentioned this issue Jun 11, 2017

File stays locked after csvread call on Windows queryverse/TextParse.jl#20

Closed

davidanthoff mentioned this issue Jun 21, 2017

Iterators and resource management #22466

Open

stevengj mentioned this issue Nov 24, 2017

WIP: RFC: Create type SecureString #24738

Closed

johirbuet mentioned this issue Aug 24, 2018

Session started at Line 169 is closed after the use tensorflow/tensorboard#1378

Closed

joelfrederico mentioned this issue Dec 7, 2018

memory leak JuliaInterop/ZMQ.jl#76

Open

ali-ramadhan mentioned this issue Oct 1, 2019

A fast netcdf output writer CliMA/Oceananigans.jl#433

Merged

vtjnash mentioned this issue May 3, 2020

run finalizers on their own thread #35689

Open

magicly mentioned this issue Apr 27, 2021

CUDA3 seems to have memory leak JuliaGPU/CUDA.jl#866

Closed

mkitti mentioned this issue Oct 24, 2022

Add built-in support to write compound data types JuliaIO/HDF5.jl#1013

Merged

stop using finalizers for resource management? #11207

stop using finalizers for resource management? #11207

Comments

JeffBezanson commented May 9, 2015

ScottPJones commented May 9, 2015

quinnj commented May 9, 2015

timholy commented May 9, 2015

carlobaldassi commented May 9, 2015

JeffBezanson commented May 9, 2015

elextr commented May 9, 2015

ScottPJones commented May 9, 2015

elextr commented May 9, 2015

aviks commented May 9, 2015

timholy commented May 9, 2015

ScottPJones commented May 9, 2015

timholy commented May 9, 2015

tknopp commented May 9, 2015

tknopp commented May 9, 2015

ScottPJones commented May 9, 2015

timholy commented May 9, 2015

tknopp commented May 9, 2015

carnaval commented May 9, 2015

wildart commented May 9, 2015

JeffBezanson commented May 9, 2015

tknopp commented May 9, 2015

carlobaldassi commented May 9, 2015

elextr commented May 9, 2015

JeffBezanson commented May 9, 2015

StefanKarpinski commented May 10, 2015

jakebolewski commented May 10, 2015

carnaval commented May 10, 2015

carnaval commented May 10, 2015

StefanKarpinski commented May 10, 2015

ScottPJones commented Jun 3, 2015

aviks commented Jun 4, 2015

quinnj commented Jun 4, 2015

amitmurthy commented Jun 5, 2015

ScottPJones commented Jun 16, 2015

amitmurthy commented Jun 23, 2015

ScottPJones commented Jun 23, 2015

yuyichao commented Jun 23, 2015

yuyichao commented Nov 15, 2015

amitmurthy commented Nov 15, 2015

stevengj commented Feb 16, 2021