-
Notifications
You must be signed in to change notification settings - Fork 29
Description
The WkbHeader will be a great optimization for a number of things we'd like to do! Another technique potentially worth exploring to optimize these very cheap functions is decomposing the parsing into a series of branchless loops. The compiler has more options to optimize a loop that contains absolutely zero if match or ||/&&s (e.g. SIMD autovectorization). I've prototyped this in C++ before and found order-of-magnitude speed improvements for Binary arrays (probably less helpful for BinaryView arrays, where there is always an if statement to access the data).
let mut n_valid_size = 0;
for item in not_null_array {
n_valid_size += item.len() >= 5;
}
if n_valid_size != not_null_array.len() {
return do_the_slow_version();
}
let mut n_little_endian = 0;
for item in not_null_array {
n_little_endian += item[0] == 0x01;
}
if n_little_endian != not_null_array.len() {
return do_the_slow_version();
}
let mut geometry_types: u32 = 0;
let mut geometry_type_bytes: [u8; 4];
for item in not_null_array {
geometry_type_bytes.copy_from_slice(&item[1..5]);
let geometry_type_id = u32::from_be_bytes(&geometry_type_bytes);
geometry_types |= 1_u32 << geometry_type_id & 0x07;
}
// Potentially do something faster if all we have are pointsThere comes a point where the multiple branchless loops become slower than a single loop with branching...I haven't experimented with this enough to know where that point is.
For functions that only operate on points, there is also a cool optimization you can do for Binary (not BinaryView) arrays: because the data for arrays with zero nulls are all lined up consecutively in memory (i.e., one WKB item after another all in the same buffer), you can loop through X values like this (once you've validated the input using the above):
let mut offset = first_x_offset;
let mut x_bytes: [u8; 8];
for i in 0..array.len() {
x_bytes.copy_from_slice(data_buffer[offset..(offset + 8)]);
let x = f64::from_le_bytes(x_bytes);
offset += 21;
}My initial explorations of this are here: https://github.com/geoarrow/geoarrow-c/blob/28eca0fea6f47c70113dc1719e7597e53bede461/dev/benchmarks/c/wkb_bounding_benchmark.cc#L408-L450 . Specifically for the bounding operation, the initial numbers suggested that this was able to trigger simd for at least the bounding operation.