Skip to content

ARROW-7216: [Java] Improve the performance of setting/clearing individual bits#5872

Closed
liyafan82 wants to merge 3 commits intoapache:masterfrom
liyafan82:fly_1120_clear
Closed

ARROW-7216: [Java] Improve the performance of setting/clearing individual bits#5872
liyafan82 wants to merge 3 commits intoapache:masterfrom
liyafan82:fly_1120_clear

Conversation

@liyafan82
Copy link
Contributor

Setting/clearing individual bits are key operations for Arrow. In this issue, we improve the performance these operations by:

  1. replacing arithmetic operations with bit-wise operations
  2. remove unnecessary casts between int/byte
  3. provide new API to remove the if branch

Benchmark results show that for clearing a bit, the performance improve by 11%, and for general set/clear operation, the performance improve by 4.7%:

before:
BitVectorHelperBenchmarks.setValidityBitBenchmark avgt 5 4.524 ± 0.015 us/op

after:
BitVectorHelperBenchmarks.setValidityBitBenchmark avgt 5 4.313 ± 0.011 us/op
BitVectorHelperBenchmarks.setValidityBitToZeroBenchmark avgt 5 4.020 ± 0.016 us/op

@github-actions
Copy link

* @param index index to be set
*/
public static void setValidityBitToZero(ArrowBuf validityBuffer, int index) {
final int byteIndex = byteIndex(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this delegate to setValidityBit instead? I think it is likely JIT will inline and do dead-code elimination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry I don't understand this comment. You mean this method should be called from setValidityBit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I thought I replied here, the performance benchmarks show the JIT will not eliminate the branches. we should comment on why code duplication is necessary instead of calling setValidityBit below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the clarification.
I have added comments explicitly to discuss this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this mimics the existing setValidityBitToOne, but I think unsetBit would be a better name (we can consider adding an alias for setValidityBitToOne as "setBit" as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Will provide the alias later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias provided. Please take a look. Thank you.

final int bitIndex = bitIndex(index);
byte currentByte = validityBuffer.getByte(byteIndex);
final byte bitMask = (byte) (1L << bitIndex);
int currentByte = validityBuffer.getByte(byteIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please provide a comment on why you are assigning a byte to an int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Comment added.

final int bitIndex = bitIndex(index);
byte currentByte = validityBuffer.getByte(byteIndex);
final byte bitMask = (byte) (1L << bitIndex);
int currentByte = validityBuffer.getByte(byteIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please comment on the assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added. Thank you.

@@ -74,12 +89,12 @@ public static void setValidityBitToOne(ArrowBuf validityBuffer, int index) {
public static void setValidityBit(ArrowBuf validityBuffer, int index, int value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still used? can it be marked as deprecated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid not. This will be used in scenarios where the bit to set is unknown a priori.
For example, in BaseVariableWidthVector#public void set(int index, int isSet, int start, int end, ArrowBuf buffer).

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a few comments/suggestions.

@emkornfield
Copy link
Contributor

+1, thank you.

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…dual bits

Setting/clearing individual bits are key operations for Arrow. In this issue, we improve the performance these operations by:

1. replacing arithmetic operations with bit-wise operations
2. remove unnecessary casts between int/byte
3. provide new API to remove the if branch

Benchmark results show that for clearing a bit, the performance improve by 11%, and for general set/clear operation, the performance improve by 4.7%:

before:
BitVectorHelperBenchmarks.setValidityBitBenchmark avgt 5 4.524 ± 0.015 us/op

after:
BitVectorHelperBenchmarks.setValidityBitBenchmark avgt 5 4.313 ± 0.011 us/op
BitVectorHelperBenchmarks.setValidityBitToZeroBenchmark avgt 5 4.020 ± 0.016 us/op

Closes apache#5872 from liyafan82/fly_1120_clear and squashes the following commits:

cb745b5 <liyafan82>  Discuss the reason for duplicate logic
aec3b04 <liyafan82>  Use better method names
58b9735 <liyafan82>  Improve the performance of setting/clearing individual bits

Authored-by: liyafan82 <fan_li_ya@foxmail.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments