Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Files opened for read with pyarrow.parquet are not explicitly closed #29391

Closed
asfimport opened this issue Aug 26, 2021 · 5 comments
Closed

Comments

@asfimport
Copy link
Collaborator

asfimport commented Aug 26, 2021

It appears that files opened for read using pyarrow.parquet.read_table (and therefore pyarrow.parquet.ParquetDataset) are not explicitly closed.  

This seems to be the case for both use_legacy_dataset=True and False.  The files don't remain open at the os level (verified using lsof).  They do however seem to rely on the python gc to close.  

My use case is that i'd like to use a custom fsspec filesystem that interfaces to an s3 like API. It handles the remote download of the parquet file and passes to pyarrow a handle of a temporary file downloaded locally.  It then is looking for an explicit close() or exit() to then clean up the temp file.  

Environment: fsspec 2021.4.0
Reporter: Richard Kimoto
Assignee: Miles Granger / @milesgranger

Related issues:

Original Issue Attachments:

PRs and other links:

Note: This issue was originally created as ARROW-13763. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Thanks for the report. It seems that, when a file or directory path is given (as opposed to an open file object), Arrow should explicitly close all files it opens by itself.

Some of this may be in the C++ dataset layer, some of this in the Python Parquet wrapper.

@asfimport
Copy link
Collaborator Author

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
For reference, the github issue where I already answered a bit: #10965

In pyarrow.parquet.ParquetFile, we indeed don't close the file or have a close method to do this. The parquet reader seems to get RandomAccessFile handle created with ReadableFile to open the file (through creating a OSFile). The C++ ReadableFile also doesn't seem to have a public method to close it (there is a private DoClose, should that be made public so layers higher up can ensure to close the ReadableFile after using it?)

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
RandomAccessFile has a Close method. Am I missing something?

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 13821
#13821

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants