Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-1341: Fix null count stats in unsigned-sort columns. #499

Merged
merged 2 commits into from Jul 3, 2018

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Jun 28, 2018

When a column is all nulls, it has no min and max values, causing it to be handled with the conversion from min and max (not the newer min_value and max_value). In that path, stats are suppressed if the column's expected sort order is not a signed order because the old code only used signed ordering.

This PR always adds the null count, regardless of the handling for min and max or min_value and max_value.

@rdblue
Copy link
Contributor Author

rdblue commented Jun 28, 2018

@gszadovszky, can you have a look at this? I think it is a (minor) bug from the recent stats work.

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.
+1

@nandorKollar
Copy link
Contributor

There's a failing unit test, which seems to be related to this change:

Failed tests: testIgnoreStatsWithSignedSortOrder(org.apache.parquet.format.converter.TestParquetMetadataConverter): Stats should be empty: num_nulls: 3, min/max not defined

@rdblue
Copy link
Contributor Author

rdblue commented Jul 3, 2018

@nandorKollar, the test failure is because null count is no longer ignored when min and max are. I need to update the test before merging this.

@rdblue rdblue merged commit d320a45 into apache:master Jul 3, 2018
ghost pushed a commit to RMS/parquet-mr that referenced this pull request Aug 18, 2018
)

* Fix null count stats in unsigned-sort columns.
* Fix test case for old min/max values and unsigned ordering.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants