Skip to content

[common] Optimize getNullCounts() to return int[] instead of Long[]#3054

Merged
wuchong merged 2 commits intoapache:mainfrom
platinumhamburg:optimize-null-counts-int-array
Apr 12, 2026
Merged

[common] Optimize getNullCounts() to return int[] instead of Long[]#3054
wuchong merged 2 commits intoapache:mainfrom
platinumhamburg:optimize-null-counts-int-array

Conversation

@platinumhamburg
Copy link
Copy Markdown
Contributor

Since null counts are stored as 4-byte integers in the batch statistics binary format, int[] is sufficient and avoids boxing overhead (8 bytes per Long vs 4 bytes per int). Use -1 as the sentinel for "not available" instead of null. This reduces memory usage for cachedNullCounts and statsNullCounts, especially for wide tables with many fields.

Closes #3021

Purpose

Linked issue: close #3021

Brief change log

Tests

API and Format

Documentation

platinumhamburg and others added 2 commits April 10, 2026 15:23
Since null counts are stored as 4-byte integers in the batch statistics
binary format, int[] is sufficient and avoids boxing overhead (8 bytes
per Long vs 4 bytes per int). Use -1 as the sentinel for "not available"
instead of null. This reduces memory usage for cachedNullCounts and
statsNullCounts, especially for wide tables with many fields.

Closes apache#3021
Copy link
Copy Markdown
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I pushed a commit to fix the compile problem.

@platinumhamburg
Copy link
Copy Markdown
Contributor Author

LGTM. I pushed a commit to fix the compile problem.

Thanks, LGTM.

@wuchong wuchong merged commit 1343e85 into apache:main Apr 12, 2026
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize DefaultLogRecordBatchStatistics.getNullCounts() to return int[] instead of Long[]

2 participants