Skip to content

ARROW-6901: [Rust] [Parquet] Increment total_num_rows when closing a row group#5672

Closed
bw-matthew wants to merge 3 commits intoapache:masterfrom
bw-matthew:NT-metadata-row-count
Closed

ARROW-6901: [Rust] [Parquet] Increment total_num_rows when closing a row group#5672
bw-matthew wants to merge 3 commits intoapache:masterfrom
bw-matthew:NT-metadata-row-count

Conversation

@bw-matthew
Copy link
Copy Markdown
Contributor

This means that the total_num_rows written to the file will accurately reflect the row groups that were successfully written.

Will need to work on a fix and then reverse those changes.

This can be run with:

```
git submodule update --init
(
    cd rust
    PARQUET_TEST_DATA="$(git rev-parse --show-toplevel)/cpp/submodules/parquet-testing/data" ARROW_TEST_DATA="$(git rev-parse --show-toplevel)/testing/data" cargo +nightly-2019-09-25 test
)
```
@bw-matthew
Copy link
Copy Markdown
Contributor Author

It's failed linting, going to make those changes.

@github-actions
Copy link
Copy Markdown

@bw-matthew bw-matthew changed the title ARROW-6901 Increment total_num_rows when closing a row group ARROW-6901: [Rust] Increment total_num_rows when closing a row group Oct 16, 2019
@bw-matthew bw-matthew changed the title ARROW-6901: [Rust] Increment total_num_rows when closing a row group ARROW-6901: [Rust] [Parquet] Increment total_num_rows when closing a row group Oct 16, 2019
Copy link
Copy Markdown
Contributor

@paddyhoran paddyhoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, but I will let @sunchao or @sadikovi comment. Shouldn't total_num_rows be u64 instead of i64?

@bw-matthew
Copy link
Copy Markdown
Contributor Author

I changed it to i64 because the RowGroupMetaData has i64 for the num_rows field (and the ColumnChunkMetaData has i64 for the num_values). All of these values cannot reasonably be negative so I understand what you are saying.

If you think it would be better to change them then I can do that, however I'm cautious about that approach because when I tried it it started interacting with things like the thrift conversion.

Copy link
Copy Markdown
Contributor

@sadikovi sadikovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for fixing it.
Did you test that manually?

I had my private branch with the fixes, but I forgot to merge it somehow into upstream back then.

@bw-matthew
Copy link
Copy Markdown
Contributor Author

This PR updates the round trip tests to check that the correct value is present when reading the file metadata from the written file. Thanks for merging this.

@bw-matthew
Copy link
Copy Markdown
Contributor Author

Just tested my branch again, it writes the correct row count now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants