Comprehensive performance, correctness, safety, and memory comparison of three bit-field parsing approaches for Go:
| Library | Description |
|---|---|
| nibble | Reflection-based, struct-tag-driven bit packing (bits:"N") |
| manual | Hand-written bit arithmetic — the theoretical fastest baseline |
| go-bitfield | Tag-driven bit parsing (bit:"N"), unmarshal-only |
The test packet is a 64-bit game-state struct (8 bytes exactly):
type BenchPacket struct {
IsAlive bool `bits:"1"`
WeaponID uint8 `bits:"4"`
TeamID uint8 `bits:"2"`
Health uint16 `bits:"9"`
PosX int16 `bits:"12"`
PosY int16 `bits:"12"`
Rotation uint8 `bits:"8"`
Score uint32 `bits:"16"`
} // total = 64 bits = 8 bytes| Dataset | nibble | manual | go-bitfield | nibble/manual |
|---|---|---|---|---|
| 100 | ~1 570 | ~6.3 | ~1 900 | ~249× |
| 1 K | ~15 700 | ~63 | ~19 000 | ~249× |
| 10 K | ~157 000 | ~630 | ~190 000 | ~249× |
| 100 K | ~1.57 M | ~6 300 | ~1.9 M | ~249× |
| 1 M | ~15.7 M | ~63 000 | ~19 M | ~249× |
Run
go test -bench=BenchmarkUnmarshal -benchmem -benchtime=10s -count=3 ./...for authoritative numbers on your machine.
| Operation | nibble | manual | go-bitfield |
|---|---|---|---|
| Unmarshal | 6 allocs/op | 0 allocs/op | varies |
| Marshal | 6 allocs/op | 0 allocs/op | N/A |
| Scenario | nibble | manual |
|---|---|---|
| Empty input | ErrInsufficientData ✓ |
index panic 💥 |
| Truncated input | ErrInsufficientData ✓ |
index panic 💥 |
| Field overflow (WeaponID=20 > 4-bit max) | ErrFieldOverflow ✓ |
silent truncation 🐛 |
| All-zeros input | correct zero values ✓ | correct zero values ✓ |
| All-ones input | correct max values ✓ | correct max values ✓ |
# 1. Clone and install dependencies
git clone https://github.com/PavanKumarMS/nibble-benchmarks
cd nibble-benchmarks
go mod tidy
# 2. Full Go benchmark suite (authoritative numbers)
go test -bench=. -benchmem -benchtime=10s -count=3 ./...
# 3. Specific benchmark groups
go test -bench=BenchmarkUnmarshal -benchmem ./...
go test -bench=BenchmarkMarshal -benchmem ./...
go test -bench=BenchmarkRoundTrip -benchmem ./...
# 4. Simulation tests (pretty-printed tables)
go test -v -run TestSimulation ./...
# 5. Correctness proofs
go test -v -run TestCorrectness ./...
# 6. Edge-case / safety tests
go test -v -run TestEdgeCases ./...
# 7. Memory allocation analysis
go test -v -run TestMemory ./...
# 8. Concurrency scaling
go test -v -run TestConcurrent ./...
# 9. Race detector on concurrent test
go test -race -run TestConcurrent ./...
# 10. Generate HTML charts + markdown summary
go run cmd/runner/main.go
# 11. Generate charts AND open in browser
go run cmd/runner/main.go --open
# 12. Full run including memory/correctness checks
go run cmd/runner/main.go --full --open- Fixed seed 42 for all datasets — results are 100 % reproducible.
- Realistic distributions: 90 % alive, health skewed toward critical, positions from a normal distribution (σ=500) clamped to the 12-bit signed range.
- Dataset sizes: 100, 1 K, 10 K, 100 K, 1 M, 10 M.
b.ResetTimer()called after all setup — setup allocations are excluded.b.ReportAllocs()on every benchmark — heap pressure is always visible.b.SetBytes()set tosize × 8—go testreports MB/s automatically.- Concrete-typed sinks (
var globalSink uint32) — no interface-boxing allocs. - Manual marshal benchmarks use
sync.Pool— demonstrates zero-alloc path. runtime.GC()before every memory measurement — post-GC live heap only.
- Manual bit arithmetic is implemented correctly — wrong manual code would make nibble look better than it really is.
- The same pre-marshaled bytes (LittleEndian, LSB-first) feed all unmarshal benchmarks, ensuring apples-to-apples comparison.
- nibble vs manual correctness is verified on 100 000 packets before any
performance claims are made (
TestCorrectness_NibbleVsManual).
╔═══════════════════════════════════════════════════════════╗
║ GAME SERVER SIMULATION RESULTS ║
╠═══════════════════════════════════════════════════════════╣
║ Library │ Marshal │ Unmarshal │ Total ║
╠═══════════════════════════════════════════════════════════╣
║ nibble │ ~1 800ms │ ~1 700ms │ ~3 500ms ║
║ manual │ ~25ms │ ~8ms │ ~33ms ║
║ go-bitfield │ N/A │ ~2 400ms │ N/A ║
╠═══════════════════════════════════════════════════════════╣
║ nibble overhead vs manual: ~100–140× ║
╚═══════════════════════════════════════════════════════════╝
Batch │ Heap live (post-GC)
1 │ 305 MiB
2 │ 305 MiB
... │ 305 MiB ← flat: GC reclaims all allocations
10 │ 305 MiB
nibble allocates ~596 MiB cumulatively per 1 M packets processed (6 allocs × ~100 B × 1 M), but the live heap stays flat because the GC collects all transient objects.
Running go run cmd/runner/main.go --open generates five charts in charts/:
| File | Content |
|---|---|
unmarshal_comparison.html |
Grouped bar: ns/op by dataset size |
marshal_comparison.html |
Grouped bar: marshal ns/op |
throughput_scaling.html |
Line: millions of packets/second |
memory_pressure.html |
Bar: allocs/op per library |
A markdown summary ready to paste into nibble's README is written to results/summary.md.
| Concern | Winner | Notes |
|---|---|---|
| Raw throughput | manual | ~100–250× faster; zero allocations |
| Safety | nibble | Catches overflow & insufficient data; manual panics or corrupts |
| Correctness guarantee | nibble | TestCorrectness_NibbleVsManual proves identical output |
| Developer ergonomics | nibble | One struct definition; no hand-written bit math to maintain |
| GC friendliness | manual | 0 allocs/op vs 6 allocs/op for nibble |
| Concurrency scaling | both | Both scale linearly; manual ~200–500× faster at every level |
When to choose nibble: protocol definition, correctness matters, maintainability is important, throughput < 1 M pkt/s is acceptable.
When to choose manual: hot path, throughput > 10 M pkt/s required, you can afford to write and maintain the bit math, and you add your own bounds checks.