feat: streaming GGUF → bgz17 indexer + OpenChat 679× compression by AdaWorldAPI · Pull Request #48 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-03-30T00:10:10Z

Summary

Streaming GGUF → bgz17 indexer (gguf_indexer.rs, 525 LOC): reads GGUF tensor-by-tensor via seek, projects each weight matrix to Base17 via golden-step averaging, writes compressed BGZ7 output. Peak RAM = one tensor, regardless of model size.
OpenChat 3.5 Q8_0 proven: 7.7 GB → 41 MB (679× overall). Attention 328×, FeedForward 920×, Embedding 3765×. Peak RAM 524 MB. 185 seconds.
f16 subnormal fix: signed arithmetic for exponent bias (113 + e), no overflow.
Layer classification: Attention, FFN, Conv2D, Embedding, Norm — auto-detected from tensor names. Conv2D [out_ch, in_ch, kH, kW] reshaped to out_ch vectors of kernel_dim.
BGZ7 artifact: src/hpc/openchat/weights/openchat-3.5-0106.bgz7 (41 MB)

Results

OpenChat 3.5 Q8_0 (7B params, Mistral architecture)
  Input:       7.70 GB
  Output:      42.62 MB
  Ratio:       679.7×

  Attention    129 tensors:  5.89 GB → 17.97 MB  (328×)
  FeedForward   96 tensors: 22.55 GB → 24.51 MB  (920×)
  Embedding      1 tensor:   524 MB →  0.14 MB  (3765×)

Test plan

14 unit tests: classification, projection, reshape, synthetic GGUF end-to-end
5 gguf.rs tests: header parsing, f16/bf16 conversion, Q8_0 dequant
Integration test on real OpenChat 3.5 Q8_0 (7.7 GB, --include-ignored)
Next: Llama 3.2 1B, SD 3.5 Large (DiT), SD 1.5 (Conv2D)

https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7

Streaming indexer output: 226 tensors indexed, 65 skipped (norms/biases). Attention 328×, FeedForward 920×, Embedding 3765×. Peak RAM 524 MB. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7

f16 bias=15, f32 bias=127. Subnormal exponent = 1-15 = -14. After mantissa normalization: f32_exp = 127 + (1-15) + e = 113 + e. Minimum e = -10 → f32_exp = 103. Always valid, no clamp needed. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7

claude added 2 commits March 30, 2026 00:03

data: OpenChat 3.5 Q8_0 → bgz17 compressed (7.7 GB → 41 MB, 679×)

9f12491

Streaming indexer output: 226 tensors indexed, 65 skipped (norms/biases). Attention 328×, FeedForward 920×, Embedding 3765×. Peak RAM 524 MB. https://claude.ai/code/session_01Y69Vnw751w75iVSBRws7o7

AdaWorldAPI merged commit a97d162 into master Mar 30, 2026
5 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: streaming GGUF → bgz17 indexer + OpenChat 679× compression#48

feat: streaming GGUF → bgz17 indexer + OpenChat 679× compression#48
AdaWorldAPI merged 2 commits into
masterfrom
claude/transcode-deepnsm-rust-oNa1Z

AdaWorldAPI commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Mar 30, 2026

Summary

Results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants