GC doesn't seem aggressive enough at times #3997

rdp · 2017-02-03T21:40:35Z

See if this one seems to use up an inordinate amount of RAM for me (it seems to for me on all OS's I tried anyway):

class Foo
  def finalize
    # Invoked when Foo is garbage-collected
    a = 3
    "bye #{3+4}"
  end

end

# Prints "Bye bye ...!" for ever
loop do
   Foo.new
end

I wonder if it's related to boehm gc finalization cycle or something. Let me know if reproducible. Thank you.

The text was updated successfully, but these errors were encountered:

waj · 2017-02-03T22:02:18Z

Objects with finalizers are collected after two GC cycles. The first cycle will run the finalizer and the second one will collect the object memory. If you add a call to GC.collect inside the loop the memory usage will stay really low.

For this reason I think it's a bad idea to allocate so many unreachable objects with finalizers in a tight loop. In fact, I think finalizers should be avoided whenever possible. They are useful to release OS resources but otherwise there might be always another (more deterministic) option to release resources or finish some work.

Can you give more details about what are you trying to accomplish with finalizers?

rdp · 2017-02-03T23:07:10Z

It does stay down with GC.collect (wait is GC.collect not being called underneath? I see finalizers being called? Is there a difference between GC.collect and standard garbage collection, and a reason why GC.collect seems to be called infrequently while RAM usage goes up and up?). That exception trace still occurs occasionally (at least on OS X).

I'm trying to debug here why servers on "some machines" seems to grow in used memory forever. The SSL sockets used do use finalizers FWIW, but I somehow doubt they're the culprit #3983

Maybe crystal is keeping an "old reference" to ancient fibers around, or maybe only allocating small amount of memory each time which BDW doesn't count towards it "should gc now" metric or some odd?

benoist · 2017-02-04T12:32:22Z

What I'm seeing in regards to GC is that the amount of heap memory seems to keep growing, but the actual memory usage is mostly lower see #3992
If I print GC stats I get this result:

GC::Stats(@heap_size=3233890304, @free_bytes=3233832960, @unmapped_bytes=0, @bytes_since_gc=0, @total_bytes=8000019728)

As you can see in my example the free bytes is almost the same as the heap size, maybe that's the case here as well?

rdp · 2017-02-05T00:50:32Z

The good news here is that if I add in an explicit GC.malloc every so often:

spawn do
  loop do
    sleep 5
    GC.collect
  end
end

It seems RES memory is "constrained" to around 1GB. Or if it's sleep 0.5 then stays around 100 MB. So the GC feels like it's working...not allowing leaks...just not being called often enough perhaps for some odd reason?

RX14 · 2017-02-05T12:30:53Z

@rdp GC.collect is expensive, ram not being used by other programs is almost free. For that reason, it's likely that the GC will instead simply use more ram instead of reclaiming more often under heavy allocation pressure (i.e. allocating in a loop).

ysbaddaden · 2017-02-05T13:54:37Z

Could we talk about GC in a single thread and close other issues? They're all related and down to the same thing: don't allocate repeatedly.

I believe @RX14 put the final nail: don't test GC on something that it's very bad at, and warns you repeatedly about it, telling exactly what you notice: memory leak!

If you want to test the GC, do it on real applications. Don't allocate in a tight loop, but reuse a single Array object for example, instead of allocating a new one on each iteration. HTTP::Server keeps HTTP::Response objects for example, resets and reuses them for another request.

benoist · 2017-02-05T14:01:59Z

Well the reason I started asking these questions is because I'm seeing it in my real application. The actual amount of items assigned in an array is a lot lower and it takes a lot longer for the memory to have grown to a size where it causes problems, but it is causing a problem. The only option at that point is to restart the application and it can run fine for days.

Any tips on how to test where the leaks actually are in a real application on a production server without reducing code to things the GC wasn't made to do in the first place?

rdp · 2017-02-06T07:22:43Z

OK I see now that allocations in finalize don't leak per se, so I'm proposing we kind of hijack this as the main thread for cases of "if I perform periodic GC.collect, my process stays at 10MB, otherwise it goes up to 5GB", if that's all right. Here's an example:

class Foo
  def finalize
    "bye #{3}"
  end
end

loop {
  Foo.new
}

I guess what I'd like answered is...if you allocate memory frequently, why is a full GC.collect (which would seem useful) seeming to not be being fired, or being fired only rarely? Are there bdwgc options that might help, for instance:) Just for discussion, as it were...

#3992 doesn't seem like quite the same issue, since it's more concerned with allocating very large objects, and GC.collect didn't help there at all, FWIW.

ysbaddaden · 2017-02-06T10:02:33Z

A quick search turned this answer, which looks interesting:
https://www.quora.com/Is-there-any-way-to-implement-a-smart-pointer-in-C

rdp · 2017-02-23T00:27:42Z

#3333 might have been related to this

sa-0001 · 2017-03-01T13:12:02Z

I also have an issue with runaway memory use due to repeated allocations within a loop:

fileAll = File.open("/target-path/all.xml", "a")

Dir.glob("/source-path/*.xml").sort.each do |path|
	xml = File.read(path, "utf8")
		.gsub(/>[\s]*\n[\s]*</, "><")
		# ... various replaces ...
		.gsub(/<\/Product>/, "</Product>\n")
	
	fileAll.puts xml
end

fileAll.close()

Of course I get the classic error "GC Warning: Repeated allocation of very large block (appr. size 35000000): May lead to memory leak and poor performance.".

I have tried the code as seen above, along with various other experimental measures:

GC.collect at top of loop
GC.collect at bottom of loop
explicitly setting xml to null at bottom of loop
open/close file within loop
combinations of the above

In all cases, as the script processes these files of about 20-40MB, the memory usage inevitably creeps up and eventually will crash.

Most advice I see seems to be simply "don't repeatedly allocate within a loop". But ... how to avoid it?

rdp · 2017-03-01T16:57:42Z

What are you allocating that needs size 35000000 I wonder...

…

On Wed, Mar 1, 2017 at 6:12 AM, sa-0001 ***@***.***> wrote: I also have an issue with runaway memory use due to repeated allocations within a loop: fileAll = File.open("/target-path/all.xml", "a") Dir.glob("/source-path/*.xml").sort.each do |path| xml = File.read(path, "utf8") .gsub(/>[\s]*\n[\s]*</, "><") # ... various replaces ... .gsub(/<\/Product>/, "</Product>\n") fileAll.puts xml end fileAll.close() Of course I get the classic error "GC Warning: Repeated allocation of very large block (appr. size 35000000): May lead to memory leak and poor performance.". I have tried the code as seen above, along with various other experimental measures: - GC.collect at top of loop - GC.collect at bottom of loop - open/close file within loop - combinations of the above In all cases, as the script processes these files of about 20-40MB, the memory usage inevitably creeps up and eventually will crash. Most advice I see seems to be simply "don't repeatedly allocate within a loop". But ... how to avoid it? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#3997 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAw0Mgew3MEZNX1vMWJFu9GfLXHnNuoks5rhW6ogaJpZM4L21j8> .

sa-0001 · 2017-03-01T17:01:05Z

What are you allocating that needs size 35000000 I wonder...

The script processes hundreds of files of around 20-40MB. Incredibly, the same script in Node.js doesn't crash, although its memory usage certainly levels off pretty high.

RX14 · 2017-03-01T17:15:45Z

@sa-0001 the best solution would be to not load the whole file into memory, but as far as I know crystal doesn't have any good ways to work with replacements on streams the same way you can on strings, so I won't advise that.

One thing you sould try is using GC.free(xml.as(Void*)). Make sure you put this after the last usage of the object. And I still can't guarantee it's safe. Obviously you shouldn't have to do this.

ozra · 2017-03-02T09:48:53Z

I'm just making a ping here so that I have track on this issue - ignore me B-)

crisward · 2017-03-02T09:55:10Z

Keeping these memory discussions in one place. I'm finding crystal never seems to reduce it's heap size even when traffic drops. Instead the GC seems to run less often. This may be intended behaviour.

This graphed data is a section from yesterday
https://www.duodesign.co.uk/memstats/
I've attached an image of the graph as the above link may change in the future.

At 13:14 I ran apache bench and as you'd expect the memory usage leapt up. It then stays there until I restart crystal.

RX14 · 2017-03-02T12:53:01Z

Libgc is a non-compacting garbage collector, which buckets allocations into seperate heaps based on allocation size. This means that the heap(s) can get fragmented. Also, i'm not sure what the heuristics for when to return heap to the OS actually are.

Again, if you want an answer from people who know what they're talking about, ask the libgc guys.

@ozra you can subscribe to issue notifications without commenting by clicking here:

crisward · 2017-03-02T18:29:08Z

ivmai/bdwgc#152 (comment)

First, libgc should be built with configure --enable-munmap option. An optional parameter can be specified along with the option - to set the desired memory blocks unmapping threshold (the number of sequential garbage collections for which a candidate block for unmapping should remain free).

Not sure if this is set? Though if not, it would explain the current behaviour

After some testing with @sdogruyol's help, we discovered go-lang seems to release memory to the os after around 5 mins, returning to pre-load state after up-to 30 mins. It would be good if crystal could mimic this behaviour.

asterite · 2017-03-02T18:39:47Z

It seems like a good thing to try. Right now we aren't setting that flag. I didn't even know libgc could be configured like that (I thought it would never give the memory back to the OS).

On OSX we are using bdw-gc from homebrew, which doesn't seem to set it either...

I wonder if there's a reason why it's not enabled by default. Why could it be?

crisward · 2017-03-02T18:52:32Z

The only docs I can find referencing this are here which makes me wonder if it's a windows only option.

crisward · 2017-03-02T19:10:30Z

I've compiled libgc with that option on mac. I'll try building crystal with it and try some tests.

luislavena · 2017-03-02T19:47:57Z

@crisward here are docs about it:

https://github.com/ivmai/bdwgc/blob/64bed186fc1ca9a654c9bca063089f5b7ead87be/doc/README.macros#L287-L291

It seems it combines it with mmap:

Causes memory to be returned to the OS under the right circumstances. This currently disables VM-based incremental collection (except for Win32 with GetWriteWatch() available).
Works under some Unix, Linux and Windows versions.
Requires USE_MMAP except for Windows.

crisward · 2017-03-02T20:05:21Z

I've got it working.

crisward · 2017-03-02T20:09:18Z

I set it to 3, so it returns memory after 3 gc cycles. ie.

brew install automake pkg-config #mac only
git clone https://github.com/ivmai/bdwgc.git
cd bdwgc
git clone https://github.com/ivmai/libatomic_ops.git
autoreconf -vif
automake --add-missing
./configure --enable-munmap=3 
make
make check
sudo make install

If you want to reproduce this on mac, you need to do the brew steps above

luislavena · 2017-03-02T20:23:18Z

Looks promising @crisward, but perhaps 3 cycles is too short for a long running process (and might result in slowdown when freeing and capturing these resources back/from OS during high traffic).

There are also interesting details unrelated to this but also apply to parallelism: by default libgc might fail in multi-threaded environments unless nummap is used (ref). I think that change is long gone, but usage of mmap and possible parallel marking might be required for thread support.

/cc @asterite

ysbaddaden · 2017-03-22T14:14:52Z

... it would require to reopen and redecode the debug sections, every time an exception is printed, which is costly (both memory and CPU) usage. This could be abused as an attack vector, too.

crisward · 2017-03-22T14:25:41Z

@ysbaddaden what about cleaning up the @@dwarf_line_numbers after a reasonable interval? I noticed there is already something the cleans-up unused fibers in kernel.cr. Something similar could be done.

The --no-debug option isn't easy for me to activate as I'm using the heroku buildpack. Also I'd quite like stack traces to be sent to sentry, I just don't want to have applications which can suddenly use more memory than I expected, the hold onto it. The exception which triggered this only happens maybe once a day.

Crystal's low memory footprint is one of it's benefits. We run lots of content managed sites from a single server. The chances of them all having uncaught exceptions at the same time is very low. So being able to clear the memory used after a reasonable interval would be good. I can obviously do this at the application level, but it could be useful to other developers.

benoist · 2017-03-22T14:28:29Z

60MB is significant if you normally only use 1MB. Having said that, 60MB shouldn't be a problem. If it is a problem because you want to run as many processes on a 512 VM, compiling with --no-debug seems like a good solution.

Even if you clear the dwarf line numbers, having your application spike like that can be dangerous if memory availability is low.

crisward · 2017-03-22T16:02:34Z

In someways, it may be better to assign the @@dwarf_line_numbers on application launch. At least you know what you're dealing with, instead of working under the false assumption you have a 1mb app running. Though I think if you're running lots of small apps, if the memory usage drops again after a minute, one or two of them spiking at once wouldn't be a problem. Also I find it odd that different setups create vastly different memory loads for the same app?

ysbaddaden · 2017-03-22T16:30:08Z

assign @@dwarf_line_numbers on application launch

That would affect programs with a short lifespan.

crisward · 2017-03-22T16:34:57Z

What about adding the patched method above. Perhaps call it CallStack.drop_exception_cache. Then document what it does so app developers can decide if they want to use it?

rdp · 2017-03-22T18:07:54Z

I could see maybe some method to "never memoize" that people could call if they want to keep RAM down and are OK with the cpu tradeoff...or would it still spike to 60MB then return down even with something like that?

…

On Wed, Mar 22, 2017 at 10:35 AM, Cris Ward ***@***.***> wrote: What about adding the patched method above. Perhaps call it CallStack.drop_exception_cache. Then document what it does so app developers can decide if they want to use it? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3997 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAw0P9kqkVXDqPhP6xl3wnDr8BgXs6Mks5roU2-gaJpZM4L21j8> .

ozra · 2017-03-23T01:55:55Z

@ysbaddaden - attacking a swarm of micro apps expected to stay at MB's each, running on a server, and spiking them all via requests causing OOM killings and what not, could also be considered an attack vector though.

@crisward , as @ysbaddaden says: --no-debug flag! - If that's hard to flip in your deployment, then I think you're using the wrong distro/packager/framework/whatever setup!

ysbaddaden · 2017-03-23T07:58:25Z

GC and HEAP memory is only one side of the memory usage. What about the process memory? What about the stacks (each Fiber consumes 8MB for it's stack)?

That being said, I don't understand how so much memory can be used; a hello world program only allocates less than 6KB of memory. Maybe Debug::DWARF::LineNumbers retains some references that can't be freed (e.g. @io).

crisward · 2017-03-23T08:13:38Z

@ozra I use https://github.com/crystal-lang/heroku-buildpack-crystal (I have a fork with the updated Libgc and I've added --no-debug for now, but actually find stack traces useful)

@ysbaddaden I agree, I'll have a read through the code, if I uncover anything I'll report back.

crisward · 2017-03-23T09:05:24Z

I've created the same app, and built for release to run on a mac and the memory issue doesn't happen with stack traces. So whatever is causing this seems platform specific. I know dwarf has a variety of formats, and their are few platform checks within the Callstack code, so either of these play a part. If the code was holding references to io, it's either in one of the platform branches or isn't the culprit.

ozra · 2017-03-28T10:22:00Z

On a side note: @ysbaddaden - The stacks for fibers are mmap'ed, and the kernel maps those pages (which are 4KB for small ones on most systems) to physical mem on page-faults - so most fibers should probably just use 8KB for stack actual reserved physical memory (if I remember correctly, there's a page set off as "protection zone" in the fiber stack alloc). Now, I might have missed something completely—or forgotten—about the alloc-code...

benoist · 2017-08-06T11:08:41Z

lib LibGC
  struct ProfStats
    heap_size : Word
    free_bytes : Word
    unmapped_bytes : Word
    bytes_since_gc : Word
    bytes_before_gc : Word
    non_gc_bytes : Word
    gc_no : Word
    markers_m1 : Word
    bytes_reclaimed_since_gc : Word
    reclaimed_bytes_before_gc : Word
  end

  fun get_prof_stats = GC_get_prof_stats(stats : ProfStats*, size : SizeT)
end

def gc_stats
  stats = Pointer(LibGC::ProfStats).malloc
  LibGC.get_prof_stats(stats, sizeof(LibGC::ProfStats))
  stats.value
end

I've recently started using the code above to get a bit more info about the GC, for example the amount of times it runs.

rdp · 2019-08-30T05:08:12Z

My initial example seems quite stable around 2MB RAM used now with 0.30.1 thanks everyone! for followers, to adjust how often it collect apparently you use the GC_free_space_divisor method and possibly some others mentioned near there, in case anybody ever wants to go down that route...Also apparently for programs that use a "lot of heap" there's a generational mode GC_enable_incremental that might be nice to add a method for sometime, it's said to be more responsive anyway. (descriptions https://github.com/ivmai/bdwgc/ ) Cheers!

rdp · 2020-02-13T06:47:12Z

Interestingly in my app on production, it tends to use 150MB of RAM, but if I have a GC loop:

spawn do
  loop do
    sleep 1.0
    GC.collect
  end
end

it stays around 40MB. This isn't as bad as it once was but might still be worth investigation at some point, at least to see if the API's mentioned earlier can keep the RAM down or not [and does it affect performance or not]. Leave a comment if anybody would like this reopened, if not I might get to it "some decade or so" :)

jhass · 2020-02-13T08:00:17Z

This probably has been mentioned before in this thread, I didn't reread, but generally this is working as intended for most GCs.

Allocating memory from the OS is costly, so many GCs employ heuristic on whether to release memory back to the system or not. And the simplest metric for a heuristic like that is the number of GC cycles that the space stayed free, hence your loop doing what it does. So generally most long running work loads have spiky memory usage, think a server application that processes and answers requests, then throws the state for that away, or a background processing daemon which is basically the same thing. Here there's clear performance advantages to avoiding the roundtrip to the OS memory allocator each time a request comes in. Then for short lived programs it really doesn't matter at being good at releasing memory ASAP, as the program is gonna terminate quickly anyways. So it's better to optimize the heuristic for the first case.

rdp · 2020-02-14T23:56:25Z

Yeah those are good points. I need to think more about this. A couple more random ideas: perform a gc "on idle" (when the scheduler detects idle). Maybe make a flag "don't cache the dwarf line numbers ever" [if exceptions are rare enough and you can handle the RAM usage when you do need an exception, or maybe even an option "don't add line information to exceptions ever" to avoid the RAM hit use entirely [if you can't afford the RAM hit at all, or "re-lookup PC locations one by one, so it never builds the large dwarf structure" ]. Just some thoughts... :)

rdp · 2020-04-21T05:39:05Z

For followers, you can also control some GC functions using environment variables: https://github.com/ivmai/bdwgc/blob/master/doc/README.environment

rdp · 2021-11-10T16:15:46Z

You can avoid the DWARF lines at runtime using CRYSTAL_LOAD_DWARF too FWIW. I think it would be nice if crystal someday came out with a "try to save RAM" compile time parameter :)

rdp · 2021-12-22T06:53:11Z

setting GC_FREE_SPACE_DIVISOR=200 didn't seem to help, FWIW...still seemed to just increase monotonically...still not sure what is going on exactly.

rdp · 2022-10-16T05:49:04Z

I did notice there are instructions here for trying to figure out why, when the heap grows out of control. Is it because of finalizers? Is BDW just not collecting often enough when it maxes out?

Hadeweka · 2023-12-04T20:48:14Z

I'm not 100% sure if this is related, but it seems to be a similar issue (at least on Windows):

https://forum.crystal-lang.org/t/memory-leak-on-windows/6197

EDIT: Wording

rdp closed this as completed Feb 6, 2017

rdp reopened this Feb 6, 2017

rdp changed the title ~~allocation in finalize might leak?~~ GC doesn't seem aggressive enough at times Feb 6, 2017

benoist mentioned this issue Feb 6, 2017

Heap size question #3992

Closed

rdp mentioned this issue Feb 23, 2017

memory leak #3333

Closed

mverzilli added the status:discussion label Mar 28, 2017

crisward mentioned this issue Nov 12, 2017

LLVM LexicalScopes build exception when building with --release flag #4719

Closed

GWRon mentioned this issue Feb 26, 2019

GC behaviour bmx-ng/brl.mod#96

Closed

rdp closed this as completed Aug 30, 2019

bcardiff mentioned this issue Oct 6, 2019

GC: profiling stats and warning bindings #8281

Merged

straight-shoota mentioned this issue Dec 20, 2019

HTTP Server – huge memory usage #8602

Closed

rdp mentioned this issue Feb 14, 2020

memory profiling feature request #8788

Open

jwoertink mentioned this issue Jul 29, 2023

Memory leak with Regex #13712

Closed

GC doesn't seem aggressive enough at times #3997

GC doesn't seem aggressive enough at times #3997

Comments

rdp commented Feb 3, 2017 • edited

waj commented Feb 3, 2017

rdp commented Feb 3, 2017 • edited

benoist commented Feb 4, 2017

rdp commented Feb 5, 2017 • edited

RX14 commented Feb 5, 2017

ysbaddaden commented Feb 5, 2017

benoist commented Feb 5, 2017

rdp commented Feb 6, 2017 • edited

ysbaddaden commented Feb 6, 2017

rdp commented Feb 23, 2017

sa-0001 commented Mar 1, 2017 • edited

rdp commented Mar 1, 2017 via email

sa-0001 commented Mar 1, 2017 • edited

RX14 commented Mar 1, 2017

ozra commented Mar 2, 2017

crisward commented Mar 2, 2017

RX14 commented Mar 2, 2017

crisward commented Mar 2, 2017

asterite commented Mar 2, 2017

crisward commented Mar 2, 2017

crisward commented Mar 2, 2017

luislavena commented Mar 2, 2017

crisward commented Mar 2, 2017

crisward commented Mar 2, 2017 • edited

luislavena commented Mar 2, 2017

ysbaddaden commented Mar 22, 2017

crisward commented Mar 22, 2017

benoist commented Mar 22, 2017

crisward commented Mar 22, 2017

ysbaddaden commented Mar 22, 2017

crisward commented Mar 22, 2017

rdp commented Mar 22, 2017 via email

ozra commented Mar 23, 2017

ysbaddaden commented Mar 23, 2017

crisward commented Mar 23, 2017 • edited

crisward commented Mar 23, 2017

ozra commented Mar 28, 2017

benoist commented Aug 6, 2017

rdp commented Aug 30, 2019 • edited

rdp commented Feb 13, 2020

jhass commented Feb 13, 2020

rdp commented Feb 14, 2020 • edited

rdp commented Apr 21, 2020

rdp commented Nov 10, 2021

rdp commented Dec 22, 2021

rdp commented Oct 16, 2022

Hadeweka commented Dec 4, 2023 • edited

rdp commented Feb 3, 2017 •

edited

rdp commented Feb 3, 2017 •

edited

rdp commented Feb 5, 2017 •

edited

rdp commented Feb 6, 2017 •

edited

sa-0001 commented Mar 1, 2017 •

edited

sa-0001 commented Mar 1, 2017 •

edited

crisward commented Mar 2, 2017 •

edited

crisward commented Mar 23, 2017 •

edited

rdp commented Aug 30, 2019 •

edited

rdp commented Feb 14, 2020 •

edited

Hadeweka commented Dec 4, 2023 •

edited