-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak #97
Comments
Does it GC at all? Which version are you running? |
Yes. And in the beginning things look just fine, it is only after a while that the heap starts growing. I am running 0.10.1. I can offer a wild guess that it isn't a GC problem at all, but rather one of timing: there are leftovers of core.async structures because there is no time to process them, and things start accumulating. I suspect that if I inserted a pause every one million transactions, things might clear up (I'll actually try that). This is a critical bug, it prevents me from using clj-rethinkdb at all. |
Turns out my wild guess was wrong: just running that loop in 100k increments, with pauses inbetween runs, is enough to eventually exhaust the heap. Some core.async structures related to pub/sub are being retained forever. Apparently people didn't notice it because nobody makes a lot of transactions. But every clj-rethinkdb app out there right now leaks memory. |
It'd be great to know if this was a core.async issue or something that we've done wrong. My (clearly naive) assumption was that the sub channels would be GC'd when there were no references to them, but that doesn't seem to be the case. I don't have a lot of time to look at this this week, are you able to dig into this a little bit? @danielytics may have some thoughts as he worked on this? |
I'm sorry, there is no way I'll be able to spend time on this in the near future :-( I just have to use something else for now, I can't put the project on hold. |
@danielcompton do you know if I don't know if this can happen with The per-query subs are being unsubbed, so that shouldn't be a problem either and as far as I'm aware the channels get GC'd when they are no longer references (so when Id like to see what happens if the queries are run at a slower rate (but for a longer period of time) to see if its a timing issue (I don't think it is, because the queries are run synchronously). I'd also like to run the core.async logic without rethinkdb to see if its a core.async bug or a clj-rethinkdb bug. I likely won't have time until Sunday at the earliest to try this myself. |
@jwr can you give me more details on your testing setup, including JVM opts, where RethinkDB is running, a graph of memory usage over time, how long the tests were running for, any logs created, what exact error you got, e.t.c.? I ran your test myself with a 512 MB (and 4GB) heap and while I could see the memory use going up, the JVM was able to successfully GC it away. Over the longer term, GC's started to get more frequent, which needs to be looked into further. Still, I wasn't seeing the heap exhaustion that you described. |
This looks very suspect: ASYNC-90 |
@danielcompton I don't think the exact JVM opts are important — as you later noticed, you'll exhaust the heap eventually anyway. I normally run with default JVM settings on Mac OS X, which is equivalent to |
Ouch... I believe ASYNC-90 is the cause of this. core.async creates a new mult for every topic and as ASYNC-90 says, when there are no taps left, the mult is still kept around. That means that once you sub to a topic, a mult is created that is never ever released. We use the ASYNC-90 is a rather nasty bug and it seems to have been known for about a year. I think core.async really needs more love and attention :( I'll reimplement the logic without pub/sub, but I won't have time until at earliest Sunday, if not later. |
I'm looking into a fix for this now. It looks like I should be able to use a |
@danielytics it might be worth commenting on the ASYNC-90 issue in JIRA so that everyone knows that people do run into this problem. I voted the issue up, but I don't want to comment, because I don't fully understand the problem. |
There is a memory leak (http://dev.clojure.org/jira/browse/ASYNC-90) using pub/sub. However as all of our pubs are one off, we can use unsub-all instead of unsub which will reclaim the memory for that pub. Relates to #97
This is a beautiful sight. Before and after fixing the issue (around 8:30): |
@jwr I'm pretty sure this fixes the memory leak, it did in my testing on a 64 MB VM. Can you install |
bravo! |
I'm happy to report that the leak is now gone! Thanks everyone! (Incidentally — how do you "manually" use an unreleased dependency in leiningen? I worked around this by placing my test code within a checked-out copy of clj-rethinkdb, but that wouldn't always work) |
|
Yep, when you install, it will install a JAR to your local Maven repo. You should see the version come up when you install it. I'll publish a snapshot soon though. |
Thanks for the tips! As for the bug fix, I would humbly suggest that it merits an immediate release of a new version. |
👍 agree with @jwr Great fix, @danielcompton! Nice and simple. I tried to refactor the code to just use a |
Deployed as |
🍦 |
I started working with larger data sets and I'm running out of memory. From a quick look with the yourkit profiler it seems that the heap gets filled up with
clojure.core.async$mult$reify__6387
objects, stored in aclojure.lang.PersistentHashMap$INode[32]
.A way to trigger this is simply to try to perform lots of operations:
If I let the above snippet run for several minutes, memory consumption will start rising after a while, eventually exhausting the 4GB heap.
I've attached a snapshot of the memory allocation by class, hopefully this will help.
![2015-10-07 at 22 47](https://cloud.githubusercontent.com/assets/38015/10350654/8b0ac10a-6d45-11e5-97a8-719e10be0efd.png)
The text was updated successfully, but these errors were encountered: