-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17583: [C++][Python] Changed datawidth of WrittenFile.size to int64 to match C++ code #14032
ARROW-17583: [C++][Python] Changed datawidth of WrittenFile.size to int64 to match C++ code #14032
Conversation
|
I'm using this script to reproduce the problem:
It's a bit cumbersome and takes a minute or so, so I don't think it is suitable to add as a unit test. |
There's a failure in |
That seems to be https://issues.apache.org/jira/browse/ARROW-17614 |
We do have a |
Is there anything I can do? I'd be happy to run additional tests if needed |
Let's just merge this as is. Thanks for the PR! |
Benchmark runs are scheduled for baseline = 6ff5224 and contender = 43670af. 43670af is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…nt64 to match C++ code (apache#14032) To fix an exception while writing large parquet files: ``` Traceback (most recent call last): File "pyarrow/_dataset_parquet.pyx", line 165, in pyarrow._dataset_parquet.ParquetFileFormat._finish_write File "pyarrow/dataset.pyx", line 2695, in pyarrow._dataset.WrittenFile.init_ OverflowError: value too large to convert to int Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor' ``` Authored-by: Joost Hoozemans <joosthooz@msn.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…nt64 to match C++ code (apache#14032) To fix an exception while writing large parquet files: ``` Traceback (most recent call last): File "pyarrow/_dataset_parquet.pyx", line 165, in pyarrow._dataset_parquet.ParquetFileFormat._finish_write File "pyarrow/dataset.pyx", line 2695, in pyarrow._dataset.WrittenFile.init_ OverflowError: value too large to convert to int Exception ignored in: 'pyarrow._dataset._filesystemdataset_write_visitor' ``` Authored-by: Joost Hoozemans <joosthooz@msn.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
To fix an exception while writing large parquet files: