Skip to content

[HLL] Question: Optimization of byte-to-int conversion in coupon_list.go (TODO at line 193) #86

@Fengzdadi

Description

@Fengzdadi

Description

I noticed a TODO comment at hll/coupon_list.go:193:

// TODO there must be a more efficient to reinterpret the byte array as an int array
for it := 0; it < couponCount; it++ {
    list.couponIntArr[it] = int(binary.LittleEndian.Uint32(byteArray[listIntArrStart+it*4 : listIntArrStart+it*4+4]))
}

I'd like to discuss potential optimizations for this code path.

Research

I investigated how other DataSketches implementations handle this:

Language Approach
Java Memory.getIntArray() - bulk copy using Unsafe.copyMemory()
C++ std::memcpy() - direct memory copy
Go (current) Loop with binary.LittleEndian.Uint32()

Benchmark Results

I ran some benchmarks comparing different Go approaches:

Method 8 items 32 items 128 items 1024 items
Current (loop) 27.6 ns 75.4 ns 270 ns 2153 ns
binary.Read 101 ns 237 ns 692 ns 5112 ns
Precompute offset 26.8 ns 69.9 ns 253 ns 2000 ns
Loop unrolling 27.6 ns 69.7 ns 248 ns 1998 ns

Key findings:

  1. binary.Read is actually slower due to reflection and extra allocations
  2. Pre-computing the offset provides ~7% improvement
  3. The current implementation is already quite efficient

Proposed Changes

Option A: Minor optimization (pre-compute offset)

base := listIntArrStart
for it := 0; it < couponCount; it++ {
    list.couponIntArr[it] = int(binary.LittleEndian.Uint32(byteArray[base:]))
    base += 4
}
  • ~7% performance improvement
  • Cleaner code
  • Remove or update TODO comment

Option B: Use unsafe package

src := byteArray[listIntArrStart : listIntArrStart+couponCount*4]
int32Slice := unsafe.Slice((*int32)(unsafe.Pointer(&src[0])), couponCount)
  • Near-zero overhead (similar to C++ memcpy)
  • Introduces unsafe dependency
  • May have portability concerns

Option C: Keep current implementation

The current implementation is already quite efficient. Perhaps just update the TODO to:

// Note: Alternative approaches (binary.Read, unsafe) were benchmarked but did not
// provide significant improvements. The current loop-based approach is optimal for Go.

Questions for Maintainers

  1. Would you accept Option A (minor optimization) as a PR?
  2. Is the project open to using unsafe for performance-critical paths?
  3. Should the TODO be updated/removed to reflect that the current approach is adequate?

Environment

  • Go version: 1.25.5
  • OS: Windows
  • CPU: Intel Core i7-10700K

I'm happy to submit a PR for any of these options based on your feedback!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions