Skip to content

PavanKumarMS/nibble-benchmark

Repository files navigation

nibble-benchmarks

Comprehensive performance, correctness, safety, and memory comparison of three bit-field parsing approaches for Go:

Library Description
nibble Reflection-based, struct-tag-driven bit packing (bits:"N")
manual Hand-written bit arithmetic — the theoretical fastest baseline
go-bitfield Tag-driven bit parsing (bit:"N"), unmarshal-only

The test packet is a 64-bit game-state struct (8 bytes exactly):

type BenchPacket struct {
    IsAlive  bool   `bits:"1"`
    WeaponID uint8  `bits:"4"`
    TeamID   uint8  `bits:"2"`
    Health   uint16 `bits:"9"`
    PosX     int16  `bits:"12"`
    PosY     int16  `bits:"12"`
    Rotation uint8  `bits:"8"`
    Score    uint32 `bits:"16"`
}  // total = 64 bits = 8 bytes

Quick Results (1 M packets, Intel i7-10510U)

Unmarshal — ns/op (lower is better)

Dataset nibble manual go-bitfield nibble/manual
100 ~1 570 ~6.3 ~1 900 ~249×
1 K ~15 700 ~63 ~19 000 ~249×
10 K ~157 000 ~630 ~190 000 ~249×
100 K ~1.57 M ~6 300 ~1.9 M ~249×
1 M ~15.7 M ~63 000 ~19 M ~249×

Run go test -bench=BenchmarkUnmarshal -benchmem -benchtime=10s -count=3 ./... for authoritative numbers on your machine.

Memory allocations per operation

Operation nibble manual go-bitfield
Unmarshal 6 allocs/op 0 allocs/op varies
Marshal 6 allocs/op 0 allocs/op N/A

Safety comparison

Scenario nibble manual
Empty input ErrInsufficientData index panic 💥
Truncated input ErrInsufficientData index panic 💥
Field overflow (WeaponID=20 > 4-bit max) ErrFieldOverflow silent truncation 🐛
All-zeros input correct zero values ✓ correct zero values ✓
All-ones input correct max values ✓ correct max values ✓

How to Reproduce

# 1. Clone and install dependencies
git clone https://github.com/PavanKumarMS/nibble-benchmarks
cd nibble-benchmarks
go mod tidy

# 2. Full Go benchmark suite (authoritative numbers)
go test -bench=. -benchmem -benchtime=10s -count=3 ./...

# 3. Specific benchmark groups
go test -bench=BenchmarkUnmarshal -benchmem ./...
go test -bench=BenchmarkMarshal   -benchmem ./...
go test -bench=BenchmarkRoundTrip -benchmem ./...

# 4. Simulation tests (pretty-printed tables)
go test -v -run TestSimulation ./...

# 5. Correctness proofs
go test -v -run TestCorrectness ./...

# 6. Edge-case / safety tests
go test -v -run TestEdgeCases ./...

# 7. Memory allocation analysis
go test -v -run TestMemory ./...

# 8. Concurrency scaling
go test -v -run TestConcurrent ./...

# 9. Race detector on concurrent test
go test -race -run TestConcurrent ./...

# 10. Generate HTML charts + markdown summary
go run cmd/runner/main.go

# 11. Generate charts AND open in browser
go run cmd/runner/main.go --open

# 12. Full run including memory/correctness checks
go run cmd/runner/main.go --full --open

Methodology

Data generation

  • Fixed seed 42 for all datasets — results are 100 % reproducible.
  • Realistic distributions: 90 % alive, health skewed toward critical, positions from a normal distribution (σ=500) clamped to the 12-bit signed range.
  • Dataset sizes: 100, 1 K, 10 K, 100 K, 1 M, 10 M.

Benchmark hygiene

  • b.ResetTimer() called after all setup — setup allocations are excluded.
  • b.ReportAllocs() on every benchmark — heap pressure is always visible.
  • b.SetBytes() set to size × 8go test reports MB/s automatically.
  • Concrete-typed sinks (var globalSink uint32) — no interface-boxing allocs.
  • Manual marshal benchmarks use sync.Pool — demonstrates zero-alloc path.
  • runtime.GC() before every memory measurement — post-GC live heap only.

Fairness

  • Manual bit arithmetic is implemented correctly — wrong manual code would make nibble look better than it really is.
  • The same pre-marshaled bytes (LittleEndian, LSB-first) feed all unmarshal benchmarks, ensuring apples-to-apples comparison.
  • nibble vs manual correctness is verified on 100 000 packets before any performance claims are made (TestCorrectness_NibbleVsManual).

Detailed Results

Game Server Simulation (1 M packets)

╔═══════════════════════════════════════════════════════════╗
║          GAME SERVER SIMULATION RESULTS                   ║
╠═══════════════════════════════════════════════════════════╣
║ Library      │ Marshal    │ Unmarshal  │ Total             ║
╠═══════════════════════════════════════════════════════════╣
║ nibble       │  ~1 800ms  │  ~1 700ms  │  ~3 500ms         ║
║ manual       │    ~25ms   │     ~8ms   │    ~33ms          ║
║ go-bitfield  │    N/A     │  ~2 400ms  │  N/A              ║
╠═══════════════════════════════════════════════════════════╣
║ nibble overhead vs manual: ~100–140×                      ║
╚═══════════════════════════════════════════════════════════╝

Memory (long-running, 10 × 1 M packets, post-GC heap)

Batch      │ Heap live (post-GC)
1          │  305 MiB
2          │  305 MiB
...        │  305 MiB   ← flat: GC reclaims all allocations
10         │  305 MiB

nibble allocates ~596 MiB cumulatively per 1 M packets processed (6 allocs × ~100 B × 1 M), but the live heap stays flat because the GC collects all transient objects.


Graphs

Running go run cmd/runner/main.go --open generates five charts in charts/:

File Content
unmarshal_comparison.html Grouped bar: ns/op by dataset size
marshal_comparison.html Grouped bar: marshal ns/op
throughput_scaling.html Line: millions of packets/second
memory_pressure.html Bar: allocs/op per library

A markdown summary ready to paste into nibble's README is written to results/summary.md.


Conclusions

Concern Winner Notes
Raw throughput manual ~100–250× faster; zero allocations
Safety nibble Catches overflow & insufficient data; manual panics or corrupts
Correctness guarantee nibble TestCorrectness_NibbleVsManual proves identical output
Developer ergonomics nibble One struct definition; no hand-written bit math to maintain
GC friendliness manual 0 allocs/op vs 6 allocs/op for nibble
Concurrency scaling both Both scale linearly; manual ~200–500× faster at every level

When to choose nibble: protocol definition, correctness matters, maintainability is important, throughput < 1 M pkt/s is acceptable.

When to choose manual: hot path, throughput > 10 M pkt/s required, you can afford to write and maintain the bit math, and you add your own bounds checks.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors