-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Description
I noticed a TODO comment at hll/coupon_list.go:193:
// TODO there must be a more efficient to reinterpret the byte array as an int array
for it := 0; it < couponCount; it++ {
list.couponIntArr[it] = int(binary.LittleEndian.Uint32(byteArray[listIntArrStart+it*4 : listIntArrStart+it*4+4]))
}I'd like to discuss potential optimizations for this code path.
Research
I investigated how other DataSketches implementations handle this:
| Language | Approach |
|---|---|
| Java | Memory.getIntArray() - bulk copy using Unsafe.copyMemory() |
| C++ | std::memcpy() - direct memory copy |
| Go (current) | Loop with binary.LittleEndian.Uint32() |
Benchmark Results
I ran some benchmarks comparing different Go approaches:
| Method | 8 items | 32 items | 128 items | 1024 items |
|---|---|---|---|---|
| Current (loop) | 27.6 ns | 75.4 ns | 270 ns | 2153 ns |
| binary.Read | 101 ns | 237 ns | 692 ns | 5112 ns |
| Precompute offset | 26.8 ns | 69.9 ns | 253 ns | 2000 ns |
| Loop unrolling | 27.6 ns | 69.7 ns | 248 ns | 1998 ns |
Key findings:
binary.Readis actually slower due to reflection and extra allocations- Pre-computing the offset provides ~7% improvement
- The current implementation is already quite efficient
Proposed Changes
Option A: Minor optimization (pre-compute offset)
base := listIntArrStart
for it := 0; it < couponCount; it++ {
list.couponIntArr[it] = int(binary.LittleEndian.Uint32(byteArray[base:]))
base += 4
}- ~7% performance improvement
- Cleaner code
- Remove or update TODO comment
Option B: Use unsafe package
src := byteArray[listIntArrStart : listIntArrStart+couponCount*4]
int32Slice := unsafe.Slice((*int32)(unsafe.Pointer(&src[0])), couponCount)- Near-zero overhead (similar to C++ memcpy)
- Introduces
unsafedependency - May have portability concerns
Option C: Keep current implementation
The current implementation is already quite efficient. Perhaps just update the TODO to:
// Note: Alternative approaches (binary.Read, unsafe) were benchmarked but did not
// provide significant improvements. The current loop-based approach is optimal for Go.Questions for Maintainers
- Would you accept Option A (minor optimization) as a PR?
- Is the project open to using
unsafefor performance-critical paths? - Should the TODO be updated/removed to reflect that the current approach is adequate?
Environment
- Go version: 1.25.5
- OS: Windows
- CPU: Intel Core i7-10700K
I'm happy to submit a PR for any of these options based on your feedback!
Metadata
Metadata
Assignees
Labels
No labels