Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in Table indices #16089

Open
taldcroft opened this issue Feb 21, 2024 · 24 comments
Open

Memory leak in Table indices #16089

taldcroft opened this issue Feb 21, 2024 · 24 comments

Comments

@taldcroft
Copy link
Member

taldcroft commented Feb 21, 2024

Description

Repeatedly accessing an indexed Table causes memory use to grow in an unexpected and undesired way. In a real-world application on a large table this was causing memory use to exceed 18 Gb. Removing the table index and repeating the access code kept memory use below 1 Gb. We used memray to see memory climbing continuously during a loop which repeatedly accessed elements of an indexed table.

Expected behavior

Memory use should remain approximately constant after the first access.

How to Reproduce

The following should reproduce the problem. You can use a package like memray to monitor memory or just watch a process monitor for memory use of the Python process. For me this starts with about 180 Mb of memory after the first table t is created. After running this memory use is around 1 Gb, while I would expect something under 400 Mb.

import numpy as np
from astropy.table import MaskedColumn
from astropy.table.table_helpers import simple_table
from astropy.time import Time
from tqdm import tqdm

size = 250000
t = simple_table(size=size, cols=26)
idxs = Time(np.random.randint(0, size // 20, size=size), format="cxcsec").isot
t["idx"] = MaskedColumn(idxs)  # THIS IS THE PROBLEM
t.add_index(["idx"])

idxs = np.random.choice(t["idx"], size=100, replace=False)
for idx in tqdm(idxs):
    star_obs = t[t["idx"] == idx]

Versions

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import astropy; print("astropy", astropy.__version__)
import numpy; print("Numpy", numpy.__version__)
import erfa; print("pyerfa", erfa.__version__)
try:
    import scipy
    print("Scipy", scipy.__version__)
except ImportError:
    print("Scipy not installed")
try:
    import matplotlib
    print("Matplotlib", matplotlib.__version__)
except ImportError:
    print("Matplotlib not installed")
macOS-14.2.1-x86_64-i386-64bit
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:27:35) [Clang 14.0.6 ]
astropy 5.3.1
Numpy 1.23.5
pyerfa 2.0.0.1
Scipy 1.10.0
Matplotlib 3.6.3
@neutrinoceros
Copy link
Contributor

I can reproduce this (on macOS too), so that's a start.
First, I've attempted to run garbage collection every 10th iterations of the loop (just because it's easy to test): no change, so it seems safe to conclude that the problem isn't trivial (maybe some unreachable reference cycles are generated, so garbage-collection isn't completely out of the picture).

Memray indeed helps visualise a slow but steady growth in resident memory, while the heap size stays at bay.
I found that with memray's default behaviour, you don't get much more details (most of the allocations are not traced).
Enabling the --native flag allows to trace allocations from C/C++ extensions, which seems to be what we want here, however, there are a number of known limitations for this on macOS:

quoting https://bloomberg.github.io/memray/native_mode.html

For the best native mode experience, we recommend running your program on Linux using an interpreter and libraries built with as much debugging information as possible.

in general native mode symbolification will not work well in macOS.

While trying to use it, I was also unfortunate enough to stumble upon what now looks like a CPython bug (reported and discussed at bloomberg/memray#553). Switching to Python 3.12.2 resolved the problem so I'm now able to get a first view of native allocations. Here's the script I'm using (with t.py containing the MWE from @taldcroft)

# t.sh
set -euxo pipefail
rm report.bin memray-flamegraph-report.html || true
python -m memray run -o report.bin --native t.py
python -m memray flamegraph report.bin
open memray-flamegraph-report.html

This took me long enough to figure out, which is why I'm reporting at an early stage. I will now try to actually inspect the profile and see if it contains enough information to find the bug (or get a sense of where to look more closely).

If it doesn't suffice, running with CPython + numpy + astropy all compiled with debug symbols would be necessary; however I know that for numpy this is significantly simpler on Linux (macOS is supposed to be supported too but I never could get anything from it).
I was planning to set up a Linux VM at some point, this may be the excuse I've been waiting for. Although, before I do that, I want to ask @MridulS if he happens to be in a better starting position to try this.

@neutrinoceros
Copy link
Contributor

update: the basic profile I get with --native but no-debug interpreter + numpy isn't really giving anything useful. What I don't understand yet is that it seems to only be tracing heap allocations, but that's not where the leak is seen.

@taldcroft
Copy link
Member Author

My suspicion is that the index is generating some kind of reference cycle that doesn't get collected. I'm not sure how to approach this but I did see a package called refcycle that might be useful.

@taldcroft
Copy link
Member Author

@neutrinoceros - one thing I just thought about .. you might try using the SCEngine sorting engine (instead of the default sorted array) and see if the leak persists. That might help localize the problem. See : https://docs.astropy.org/en/stable/table/indexing.html#engines

@MridulS
Copy link
Contributor

MridulS commented Feb 22, 2024

I'm trying to dig into the code here, and yes another +1 I guess on this issue :)

Plus SCEngine seems to make it worse!

@MridulS
Copy link
Contributor

MridulS commented Feb 22, 2024

Alright I did some more digging in with memray and it doesn't seem like setting the index makes any difference (?)
I ran the following script and the I have attached the resident and heap size graph.

from astropy.table.table_helpers import simple_table
import numpy as np

t = simple_table(size=250000, cols=26)
# t.add_index('a')
for _ in range(2_000):
   t[np.random.randint(240000, size=250000)]

Without index:
Screenshot 2024-02-22 at 11 12 12 PM

With index: (it just took a lot more time)
Screenshot 2024-02-22 at 11 13 02 PM

Is there any other minimal example I can try playing around with?

@taldcroft
Copy link
Member Author

@MridulS - can you try with t2 = t[np.random.randint(240000, size=250000)]. I know that should not make a difference, but in my testing it seems like it did. This sort of testing is always tricky and maybe I got confused.

@taldcroft
Copy link
Member Author

@MridulS - if you are still having trouble reproducing I can try adapting the original problem. Of course it is more complicated and depends on data I can't export. In that case removing the index line changed memory use from 18 Gb to 1 Gb.

@MridulS
Copy link
Contributor

MridulS commented Feb 23, 2024

I am not able to properly reproduce this atleast with memray diagnostics. I would assume if I ran this bit of code for a lot more time, the resident memory usage should keep on increasing (?) It seems to level off a certain time.

In #16089 (comment) I was using just htop to look at memory usage of the ipython process and the memory peaking was visible, but I guess I didn't have proper quantitative diagnostic for it.

I tried to tweak the script a bit more and now I hardly see anything off?

from astropy.table.table_helpers import simple_table
import numpy as np

t = simple_table(size=2500000, cols=26)
t.add_index('a')
slice_ = np.random.randint(240000, size=2500000)

for _ in range(400):
  t2 = t[slice_]
Screenshot 2024-02-23 at 12 41 45 PM

Maybe I am missing something here @neutrinoceros any thoughts?

@pllim
Copy link
Member

pllim commented Feb 23, 2024

Any chance this is hardware specific? This was reported in OP:

macOS-14.2.1-x86_64-i386-64bit
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:27:35) [Clang 14.0.6 ]

@taldcroft
Copy link
Member Author

It's been seen in the wild on both Linux and Mac.

@neutrinoceros
Copy link
Contributor

Thanks @MridulS for taking a look !

I tried to tweak the script a bit more and now I hardly see anything off?

The main difference that I see here is that instead of generating new selectors at each iterations, we just generate one and repeatedly access in a loop. I also do not see a leak within this version of the script, though I don't have a clear idea yet what difference it would make.

I would assume if I ran this bit of code for a lot more time, the resident memory usage should keep on increasing (?) It seems to level off a certain time.

Even though Python's garbage collector is, understandably, not really efficient at collecting dead reference cycles, I believe it does catches up with them on a somewhat lower frequency than regular cycles, so maybe that's why the leak will saturate at some point ? (this is still just guessing)

Anyway, I think proper inspection will indeed require heavy machinery. I'll make a serious attempt at it this week !

@taldcroft
Copy link
Member Author

@neutrinoceros @MridulS - since you are not reproducing the really troublesome leak we saw, why don't you hold off on further debugging of this until I get you a better example.

@neutrinoceros
Copy link
Contributor

Works for me.

@taldcroft
Copy link
Member Author

taldcroft commented Feb 26, 2024

@neutrinoceros @MridulS - sorry for not understanding the problem a little better before opening this, but I think the example (now updated in the description) is reliably generating the memory leak. I see it consume about 2.4 Gb each time I run the final for loop.

The missing ingredient before was the MaskedColumn, but I also made this more like our original example. I'm not sure exactly what is important but maybe having a wider string column as the index column is contributing to more memory use.

@taldcroft
Copy link
Member Author

One obvious mitigation is to have add_index not allow using a MaskedColumn.

@neutrinoceros
Copy link
Contributor

neutrinoceros commented Feb 26, 2024

Thanks a lot ! I confirm that the updated example leaks very efficiently on my machine too. I also confirm that using

t["idx"] = MaskedColumn(idxs).data.data

avoids the leak.

@neutrinoceros
Copy link
Contributor

Also tried

t["idx"] = np.ma.MaskedArray(idxs)

And this still reproduces the leak, so I'm starting to suspect that the bug lives in numpy.

@MridulS
Copy link
Contributor

MridulS commented Feb 26, 2024

Ohhhh, this produces a beautiful leak.
Screenshot 2024-02-26 at 8 37 05 PM

@mhvk
Copy link
Contributor

mhvk commented Feb 27, 2024

@neutrinoceros - assigning a MaskedArray to a table gives a MaskedColumn - so no joy there...

@neutrinoceros
Copy link
Contributor

Ah, of course. Then there's a chance we're just producing reference cycles in the Python layer. I'll dig in this direction today !

@pablogsal
Copy link

👋 Hey, memray author and Python core dev here. I run your example with native symbols and debug info and this is what I get:
Screenshot 2024-02-27 at 17 19 40

So looks like most of the memory (2.1GB) is allocated in astropy/table/column.py:529. You can easily do the following to get these using docker:

$ docker run  --platform linux/amd64 -v $PWD:/src  --rm -it ubuntu:latest bash
$ apt-get install python3-numpy debuginfod python3-pip
$ python3 -m pip install astropy memray
$ export DEBUGINFOD_URLS="https://debuginfod.ubuntu.com" 
$ python3 -m memray example.py --trace-python-allocators --native
$ python3 -m memray flamegraph output.bin --leaks

@pablogsal
Copy link

pablogsal commented Feb 27, 2024

If you want to give the refcycle theory a go you can use https://docs.python.org/3/library/gc.html#gc.set_debug with https://docs.python.org/3/library/gc.html#gc.DEBUG_LEAK to confirm what are cycles.

@neutrinoceros
Copy link
Contributor

thanks a ton @pablogsal !

@MridulS @taldcroft just to keep you guys tuned, I haven't forgotten about this. I'm just prioritising #16070 for now, but this one is the next item on my list !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants