New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection mystery #9415

Closed
ottemw opened this Issue Dec 19, 2014 · 7 comments

Comments

Projects
None yet
3 participants
@ottemw

ottemw commented Dec 19, 2014

Synopsis: Garbage collection is taking significant time, even when (I think) nothing is destroyed.

After loading the following code...

type Container
  data::Float64
end

function funC(n::Int)
  tic()
  C = Array(Container,n)
  for i = 1:n
    C[i] = Container(0.0)
  end
  println("  C's length: $(length(C))")
  println(" inside time: $(toq())")
end

function testC(n::Int = 100000000, manualGC::Bool = false)
  tic()
  funC(n)
  println("outside time: $(toq())")
  if manualGC
    tic()
    gc()
    println(" manual gc(): $(toq()) (only time for this gc() call)")
  end
  println("----------------------")
end

... and then running once to compile (not shown). I then run as follows:

julia> gc()

julia> @time testC(10000000, false)
  C's length: 10000000
 inside time: 0.263652254
outside time: 0.263681618
----------------------
elapsed time: 0.26371345 seconds (240002000 bytes allocated, 48.61% gc time)

julia> gc()

julia> @time testC(10000000, true)
  C's length: 10000000
 inside time: 0.258428296
outside time: 0.258460679
 manual gc(): 0.047133994 (only time for this gc() call)
----------------------
elapsed time: 0.30575224 seconds (240002728 bytes allocated, 56.58% gc time)

There is a relatively large amount of time spent garbage collecting (approx 50%) . The first test shows that the bulk of the garbage collection happens neither upon the exit of funC() nor upon the exit of testC() since inside time, outside time, and elapsed time are practically identical. Therefore, it must be happening somewhere within funC(). The second test shows that there is some remaining garbage collection possible after the exit of testC() (perhaps the array C?), but that this is relatively small compared with the mystery amount that is happening elsewhere.

My apologizes if I am missing something, but it seems to me that nothing should be destroyed until after testC() exits. If this is, in fact, expected behavior, then I would appreciate an explanation of what hidden local variables are being created and destroyed, and if there is any way to reduce this overhead.

Replacing Container with Float64 does not use nearly the same amount of garbage collection time, which may indicate that the problem has somehting to do with composite types, but I realize that the small size of Float64 vs. Container also complicates the issue.

I am using Julia 0.4.0-dev+2006 (2014-12-08 21:48 UTC)

Thanks for any feedback.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 19, 2014

Member

Mutable types have to be heap allocated. Try changing type Container to immutable Container.

Member

StefanKarpinski commented Dec 19, 2014

Mutable types have to be heap allocated. Try changing type Container to immutable Container.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 19, 2014

Member

Also, this kind of question is really more apt for the julia-users list than a GitHub issue, which is for specific bugs or features.

Member

StefanKarpinski commented Dec 19, 2014

Also, this kind of question is really more apt for the julia-users list than a GitHub issue, which is for specific bugs or features.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Dec 19, 2014

Member

Time spent doing GC is mostly not dictated by how much garbage is collected, but rather by how much heap-allocated memory must be scanned to check if it is garbage or not. You can do a GC pass and collect no garbage at all yet it will take a while.

Member

StefanKarpinski commented Dec 19, 2014

Time spent doing GC is mostly not dictated by how much garbage is collected, but rather by how much heap-allocated memory must be scanned to check if it is garbage or not. You can do a GC pass and collect no garbage at all yet it will take a while.

@ottemw

This comment has been minimized.

Show comment
Hide comment
@ottemw

ottemw Dec 19, 2014

Point taken about the approperiate venue for this type of question, sorry about that, and thank for the quick answer :)

After changing type Container to immutable Container I now get 35% gc time for test 1 and 52.2% gc time for test 2. This is still high (in my opinion). Is this happening because the array C is mutable (and large)? and so all elements of C are essentially scanned during the garbage collection?

ottemw commented Dec 19, 2014

Point taken about the approperiate venue for this type of question, sorry about that, and thank for the quick answer :)

After changing type Container to immutable Container I now get 35% gc time for test 1 and 52.2% gc time for test 2. This is still high (in my opinion). Is this happening because the array C is mutable (and large)? and so all elements of C are essentially scanned during the garbage collection?

@carnaval

This comment has been minimized.

Show comment
Hide comment
@carnaval

carnaval Dec 19, 2014

Contributor

To expand a bit, the gc doesn't spend much time actually "destroying" things (arguably it is the fastest part of the work). Most of the time is spent walking the heap to see if anything needs to be destroyed.
In cases like this, where you allocate a lot of memory and generate very little garbage, the best thing to do is to disable gc around the code, and run it once at the end. If doing this you should probably also profile your memory usage.

Contributor

carnaval commented Dec 19, 2014

To expand a bit, the gc doesn't spend much time actually "destroying" things (arguably it is the fastest part of the work). Most of the time is spent walking the heap to see if anything needs to be destroyed.
In cases like this, where you allocate a lot of memory and generate very little garbage, the best thing to do is to disable gc around the code, and run it once at the end. If doing this you should probably also profile your memory usage.

@carnaval

This comment has been minimized.

Show comment
Hide comment
@carnaval

carnaval Dec 19, 2014

Contributor

Ha, should have refreshed before submit.
Yes, any array of pointers must be scanned in its entierty at each garbage collection.

Contributor

carnaval commented Dec 19, 2014

Ha, should have refreshed before submit.
Yes, any array of pointers must be scanned in its entierty at each garbage collection.

@ottemw

This comment has been minimized.

Show comment
Hide comment
@ottemw

ottemw Dec 19, 2014

Ah, Ok, many thanks!

ottemw commented Dec 19, 2014

Ah, Ok, many thanks!

@ottemw ottemw closed this Dec 19, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment