Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyAthena - Memory Issue #417

Open
innicoder opened this issue Mar 14, 2023 · 7 comments
Open

PyAthena - Memory Issue #417

innicoder opened this issue Mar 14, 2023 · 7 comments

Comments

@innicoder
Copy link

innicoder commented Mar 14, 2023

If you remember the last request where we continually execute queries, like 1000 per hour it seems that the memory is continually growing and can't stop it.

This happens with a PandasCursor I thought the solution is to use chunksize but that wasn't the issue. The problem is that the memory still grows by 0.1 and since we have a deamon thread that runs 24/7 it eventually grows beyond memory size and doesn't reclaim itself.

I tried to execute manually gc.collect() and delete the object dataframe but something seems to be going on internally in your library after 16 hours of investigation that seems to be the problem and I'm out of reach for now.

I'm looking for an idea on how to resolve this issue. Thanks.

@innicoder
Copy link
Author

@laughingman7743 Let me know if you have a idea.

Here's some of the things we talked about #416 in the last issue.

To recreate the issue just use any query and repeat it in a docker container, you will see it grow, by 0.1 MB each time and it doesn't reclaim that memory space.

@laughingman7743
Copy link
Owner

Sorry, I don't know, but there must be a memory leak somewhere.

@laughingman7743
Copy link
Owner

pandas-dev/pandas#51667 👀

@innicoder
Copy link
Author

innicoder commented Mar 14, 2023 via email

@laughingman7743
Copy link
Owner

When using the unload option, the read_csv method is not used. I am wondering if the same memory leakage occurs in that case.

@innicoder
Copy link
Author

innicoder commented Mar 14, 2023 via email

@Duncan-Hunter
Copy link

Duncan-Hunter commented Jun 16, 2023

Hi, I think I'm experiencing the same or similar issue, creating new PandasCursors and coming into a memory leak. I've been using objgraph to diagnose it, I think there's something to do with this loop and the S3FIleSystem. The pandas cursor creates an AthenaPandasResultSet which creates an S3FileSystem, then something to do with the AbstractFileSystem in fsspec? I'm not an expert in this sort of thing, but hopefully it helps.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants