Skip to content

GH-3585: Fix DataPageHeaderV2.num_nulls=-1 when column statistics are disabled#3586

Open
keuin wants to merge 1 commit into
apache:masterfrom
keuin:master
Open

GH-3585: Fix DataPageHeaderV2.num_nulls=-1 when column statistics are disabled#3586
keuin wants to merge 1 commit into
apache:masterfrom
keuin:master

Conversation

@keuin
Copy link
Copy Markdown

@keuin keuin commented May 25, 2026

Rationale for this change

Fix the num_nulls field in DataPageHeaderV2, which will be unexpectedly set to -1 when statistics is disabled.

What changes are included in this PR?

Add a counter tracker in ColumnValueCollector, which is always enabled, to track required field num_nulls.

Are these changes tested?

Yes, tested. The test will fail before this fix, and will succeed with this fix.

Are there any user-facing changes?

No user-facing or API changes.

Closes #3585

this.nullCount = 0;
this.statistics = statisticsEnabled
? Statistics.createStats(path.getPrimitiveType())
: Statistics.noopStats(path.getPrimitiveType());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to change Statistics.noopStats to make it possible to ignore min/max stats but still collects null count?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, but putting it outside Statistics is a design choice. By reading code I think the Statistics object is a 1:1 mapping to parquet header's statistics object, which is optional and may be disabled by user.

Putting counter of num_nulls outside statistics object makes the code clearer, because the code maps clearly to parquet header structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataPageHeaderV2.num_nulls=-1 when column statistics are disabled

2 participants