RWKV-7 streaming trainer with online dynamic tokenization, state-tuning, and surprise-prioritized replay. Single Rust binary. Trains on Claude Code traces via ccsniff.
- Model: RWKV-7 "Goose" 1.5B (
RWKV/RWKV7-Goose-World3-1.5B-HF) viacandle-transformers. - Sub-quadratic: O(L) inference, fixed-size hidden state — infinite streams at constant memory.
- State-tuning: gradients route only through per-layer recurrent state + a hypernetwork for dynamic-token embeddings. Pretrained weights frozen. ~6 GB VRAM target on CPU/CUDA.
- Online dynamic tokenization: bigram-merge LZW-inspired (akin to zip2zip). Sliding window of 512 bigrams; bigrams seen ≥3 times become virtual tokens; embeddings produced by a 2-layer hypernetwork over the constituent base-token embeddings; LRU cache of capacity 256.
- Surprise-prioritized replay: ring buffer (max 1000) sampled with probability ∝ surprise — see SuRe.
- Data source:
ccsniffstreams Claude Code JSONL events into typedTraceenum via tokio subprocess + bounded mpsc.
sttx-core/ observability, dynamic tokenizer, replay buffer, trace types
sttx-ccsniff/ ccsniff subprocess adapter
sttx-train/ model load, state-tuning training loop, checkpoints
sttx-cli/ streamtts.exe binary, clap subcommands
tests/ single integration test (cargo test --release)
cargo build --release
./target/release/streamtts --help
./target/release/streamtts train --ccsniff-from live --steps 1000 --checkpoint-dir ./ckpt
./target/release/streamtts inspect --checkpoint ./ckpt/latest
./target/release/streamtts merge-stats --checkpoint ./ckpt/latest
cargo run --release --example smoke_load
cargo run --release --example smoke_train
smoke_load downloads the 1.5B safetensors (~3 GB) into the HF cache, instantiates rwkv_v7::Model, and runs a single forward_seq to print vocab_size × first-5-logits. smoke_train runs 5 state-tuning steps over a synthetic stream, saves a checkpoint to ckpt-smoke/, and prints (steps, losses, merges, promotions).
cargo test --workspace --release
8 test binaries, all green:
sttx-coreunit tests (5): observability JSONL round-trip under concurrent writes, replay-buffer surprise-weighted sampling + low-surprise eviction, dynamic-tokenizer bigram promotion + merge, hypernetwork forward shape.sttx-cliintegration (2 groups, 23 assertions): data-plane (obs + replay + tokenizer + trace + hypernetwork) and system-plane (ccsniff JSONL parser exercised againsttest-data/ccsniff-fixture.ndjson— 40 real Claude Code events captured vianpx ccsniff -f --json --limit 40).sttx-clitrain_witness(1): realrwkv_v7::Model::newinstantiated from a synthesized random-init safetensors at a tiny config (vocab=64, hidden=32, layers=2, head=16), realforward_seqproduces logits of correct shape, real per-layer state tensors, real 8-stepTrainer::step_on_traceover a synthetic stream, realAdamW.backward_step, realVarMap.save/load_metaround-trip. Exercises every code path the production loop uses against pretrained 1.5B weights — only the bytes differ.
MIT