Release v0.8.1 — TQ35-PRIMARY (329K tokens) · Aitherium/aitherkvcache

What's New

TQ35-PRIMARY mode: 329,072 tokens on a single consumer GPU (4x compression over standard fp8 pages)
CUDA graph diagnosis: identified graph capture corruption as root cause of gibberish output after spec decoding experiments
Recommended config: AITHER_TQ_MODE=tq4-primary + AITHER_TQ_EAGER=1 — 309K tokens, coherent output, 6-8 tok/s

Configuration

# Maximum capacity (slower decode, ~2-5 tok/s)
AITHER_TQ_MODE=tq35-primary

# Best balance (309K tokens, ~6-8 tok/s, coherent)  
AITHER_TQ_MODE=tq4-primary
AITHER_TQ_EAGER=1

pip install aither-kvcache==0.8.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.1 — TQ35-PRIMARY (329K tokens)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's New

Configuration

Uh oh!