Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Memorymapped arrow file conversion to parquet loads everything into RAM #25594

Open
asfimport opened this issue Jul 20, 2020 · 1 comment

Comments

@asfimport
Copy link

When converting a memory mapped arrow file into parquet file, it loads the whole table into RAM. This effectively negates the point of memory mapping.

If this is not a bug,  perhaps there is a proper way of converting the memorymapped arrow file to parquet without using excessive memory?

 

Example code:

    source = pa.memory_map(path_to_arrow_file, 'r')
    table = pa.ipc.RecordBatchFileReader(source).read_all()
    # The followlng line will load the whole thing into RAM
    pq.write_table(table, path_to_parquet)

Reporter: Sep Dehpour

Note: This issue was originally created as ARROW-9526. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
When memory mapping a file, it's the OS which makes decisions about whether to keep things in memory. Chances are that, if the file is kept entirely in memory, it simply means there's no other application requesting that memory.

Do you have reasons to believe something different is happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant