-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in Table indices #16089
Comments
I can reproduce this (on macOS too), so that's a start. Memray indeed helps visualise a slow but steady growth in resident memory, while the heap size stays at bay. quoting https://bloomberg.github.io/memray/native_mode.html
While trying to use it, I was also unfortunate enough to stumble upon what now looks like a CPython bug (reported and discussed at bloomberg/memray#553). Switching to Python 3.12.2 resolved the problem so I'm now able to get a first view of native allocations. Here's the script I'm using (with # t.sh
set -euxo pipefail
rm report.bin memray-flamegraph-report.html || true
python -m memray run -o report.bin --native t.py
python -m memray flamegraph report.bin
open memray-flamegraph-report.html This took me long enough to figure out, which is why I'm reporting at an early stage. I will now try to actually inspect the profile and see if it contains enough information to find the bug (or get a sense of where to look more closely). If it doesn't suffice, running with CPython + numpy + astropy all compiled with debug symbols would be necessary; however I know that for numpy this is significantly simpler on Linux (macOS is supposed to be supported too but I never could get anything from it). |
update: the basic profile I get with |
My suspicion is that the index is generating some kind of reference cycle that doesn't get collected. I'm not sure how to approach this but I did see a package called refcycle that might be useful. |
@neutrinoceros - one thing I just thought about .. you might try using the |
I'm trying to dig into the code here, and yes another +1 I guess on this issue :) Plus |
@MridulS - can you try with |
@MridulS - if you are still having trouble reproducing I can try adapting the original problem. Of course it is more complicated and depends on data I can't export. In that case removing the index line changed memory use from 18 Gb to 1 Gb. |
I am not able to properly reproduce this atleast with memray diagnostics. I would assume if I ran this bit of code for a lot more time, the resident memory usage should keep on increasing (?) It seems to level off a certain time. In #16089 (comment) I was using just I tried to tweak the script a bit more and now I hardly see anything off? from astropy.table.table_helpers import simple_table
import numpy as np
t = simple_table(size=2500000, cols=26)
t.add_index('a')
slice_ = np.random.randint(240000, size=2500000)
for _ in range(400):
t2 = t[slice_] Maybe I am missing something here @neutrinoceros any thoughts? |
Any chance this is hardware specific? This was reported in OP:
|
It's been seen in the wild on both Linux and Mac. |
Thanks @MridulS for taking a look !
The main difference that I see here is that instead of generating new selectors at each iterations, we just generate one and repeatedly access in a loop. I also do not see a leak within this version of the script, though I don't have a clear idea yet what difference it would make.
Even though Python's garbage collector is, understandably, not really efficient at collecting dead reference cycles, I believe it does catches up with them on a somewhat lower frequency than regular cycles, so maybe that's why the leak will saturate at some point ? (this is still just guessing) Anyway, I think proper inspection will indeed require heavy machinery. I'll make a serious attempt at it this week ! |
@neutrinoceros @MridulS - since you are not reproducing the really troublesome leak we saw, why don't you hold off on further debugging of this until I get you a better example. |
Works for me. |
@neutrinoceros @MridulS - sorry for not understanding the problem a little better before opening this, but I think the example (now updated in the description) is reliably generating the memory leak. I see it consume about 2.4 Gb each time I run the final The missing ingredient before was the |
One obvious mitigation is to have |
Thanks a lot ! I confirm that the updated example leaks very efficiently on my machine too. I also confirm that using t["idx"] = MaskedColumn(idxs).data.data avoids the leak. |
Also tried t["idx"] = np.ma.MaskedArray(idxs) And this still reproduces the leak, so I'm starting to suspect that the bug lives in numpy. |
@neutrinoceros - assigning a |
Ah, of course. Then there's a chance we're just producing reference cycles in the Python layer. I'll dig in this direction today ! |
If you want to give the refcycle theory a go you can use https://docs.python.org/3/library/gc.html#gc.set_debug with https://docs.python.org/3/library/gc.html#gc.DEBUG_LEAK to confirm what are cycles. |
thanks a ton @pablogsal ! @MridulS @taldcroft just to keep you guys tuned, I haven't forgotten about this. I'm just prioritising #16070 for now, but this one is the next item on my list ! |
Description
Repeatedly accessing an indexed
Table
causes memory use to grow in an unexpected and undesired way. In a real-world application on a large table this was causing memory use to exceed 18 Gb. Removing the table index and repeating the access code kept memory use below 1 Gb. We usedmemray
to see memory climbing continuously during a loop which repeatedly accessed elements of an indexed table.Expected behavior
Memory use should remain approximately constant after the first access.
How to Reproduce
The following should reproduce the problem. You can use a package like
memray
to monitor memory or just watch a process monitor for memory use of the Python process. For me this starts with about 180 Mb of memory after the first tablet
is created. After running this memory use is around 1 Gb, while I would expect something under 400 Mb.Versions
The text was updated successfully, but these errors were encountered: