Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 6 additions & 19 deletions datafusion/spark/src/function/bitwise/bit_count.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,24 +102,25 @@ fn spark_bit_count(value_array: &[ArrayRef]) -> Result<ArrayRef> {
DataType::Int8 => {
let result: Int32Array = value_array
.as_primitive::<Int8Type>()
.unary(|v| bit_count(v.into()));
.unary(|v| (v as i64).count_ones() as i32);
Ok(Arc::new(result))
}
DataType::Int16 => {
let result: Int32Array = value_array
.as_primitive::<Int16Type>()
.unary(|v| bit_count(v.into()));
.unary(|v| (v as i64).count_ones() as i32);
Ok(Arc::new(result))
}
DataType::Int32 => {
let result: Int32Array = value_array
.as_primitive::<Int32Type>()
.unary(|v| bit_count(v.into()));
.unary(|v| (v as i64).count_ones() as i32);
Ok(Arc::new(result))
}
DataType::Int64 => {
let result: Int32Array =
value_array.as_primitive::<Int64Type>().unary(bit_count);
let result: Int32Array = value_array
.as_primitive::<Int64Type>()
.unary(|v| v.count_ones() as i32);
Ok(Arc::new(result))
}
DataType::UInt8 => {
Expand Down Expand Up @@ -155,20 +156,6 @@ fn spark_bit_count(value_array: &[ArrayRef]) -> Result<ArrayRef> {
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this algorithm is a SWAR hamming weight implementation. Per the code comments in java.lang.Long, this comes from Hacker's Delight.

What's interesting is that the Rust compiler generates something very similar when calling count_ones().

With a sufficiently recent target architecture though you get popcnt instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The popcnt instruction is crazy fast -- we tested it in one example where we had a special codepath for no nulls, and I was worried that calculating the test if nulls.count_ones() == 0 would overwhelm the improvement

Nowhere close. 🚀

Hackers Delight is a classic -- I am not at all surprised that the Rust compiler includes all those tricks (and then some!)

// Here’s the equivalent Rust implementation of the bitCount function (similar to Apache Spark's bitCount for LongType)
// Spark: https://github.com/apache/spark/blob/ac717dd7aec665de578d7c6b0070e8fcdde3cea9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitwiseExpressions.scala#L243
// Java impl: https://github.com/openjdk/jdk/blob/d226023643f90027a8980d161ec6d423887ae3ce/src/java.base/share/classes/java/lang/Long.java#L1584
fn bit_count(i: i64) -> i32 {
let mut u = i as u64;
u = u - ((u >> 1) & 0x5555555555555555);
u = (u & 0x3333333333333333) + ((u >> 2) & 0x3333333333333333);
u = (u + (u >> 4)) & 0x0f0f0f0f0f0f0f0f;
u = u + (u >> 8);
u = u + (u >> 16);
u = u + (u >> 32);
(u as i32) & 0x7f
}

#[cfg(test)]
mod tests {
use super::*;
Expand Down