Kodachi is a C++ / GGML ASR research repository for CPU-first speech recognition, with RISC-V as the main deployment target.
The current practical path is ZipFormer streaming ASR:
- Chinese-English ZipFormer RNNT runtime in C++ and GGML
- x86 host bring-up and profiling
- RISC-V GCC 14 cross compilation
- RVV 1.0 + GGML Q4_0 repack experiments
- selective quantization for the ZipFormer encoder
- local websocket service compatible with Shinsoku-style Soniox and Bailian streaming backends
- systemd deployment on the RISC-V board
The repository also keeps the earlier Qwen3-ASR / ForcedAligner CPU work as a research baseline. That path includes K-Quant, TurboQuant, rotated tiled experiments, and RVV bring-up, but it is not the recommended real-time service target on the current board.
Qwen3-ASR 0.6B is too large for the current RISC-V board if the goal is real-time local ASR. The best measured GCC 14 + RVV Qwen path was still around RTF_wall 4.57 on a short sample.
ZipFormer is the active deployment path. The current selective-quant ZipFormer model keeps decoder and joiner in Q8_0 while quantizing encoder 2D weights to Q4_0. On the RISC-V board this produced correct output on the tested sample at about:
wall RTF: 0.4802
encoder RTF: 0.3483
peak RSS: 376908 KB
See implementation_report_2026-04-13.md for the full implementation and benchmark history.
src/
qwen3_asr.* Qwen3-ASR orchestration
forced_aligner.* Qwen3 forced aligner
audio_encoder.* Qwen3 audio encoder
text_decoder.* Qwen text decoder
turboquant_runtime.* TurboQuant runtime experiment
zipformer_*.{h,cpp} ZipFormer GGML runtime
tools/
zipformer_cli.cpp Offline ZipFormer CLI
zipformer_streaming_server.cpp
Websocket streaming ASR service
zipformer_onnx_to_gguf.py ZipFormer ONNX to GGUF converter
requantize_gguf.cpp GGUF quantization helper
scripts/
build_riscv64_gcc14_cross.sh
selective_zipformer_quant.py
benchmark_*.py
turboquant_*.py
docs/
implementation_report_2026-04-13.md
zipformer_streaming_server.md
packaging/
zipformer-streaming.service
ggml/
Vendored GGML source
third_party/nlohmann/
Header-only JSON dependency for the websocket server
Large artifacts are intentionally not tracked:
- GGUF models
- ONNX exports
- WAV/audio corpora
- build directories
- benchmark dumps
- temporary converted models
Configure and build the ZipFormer CLI and streaming server:
cmake -S . -B build-host-zipformer -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DQWEN3_ASR_NATIVE_ARCH=ON \
-DQWEN3_ASR_TIMING=ON \
-DGGML_CCACHE=OFF
cmake --build build-host-zipformer \
--target zipformer-cli zipformer-streaming-server test_streaming_fbank \
-j$(nproc)Run the incremental fbank test:
build-host-zipformer/test_streaming_fbankRun ZipFormer offline:
env ZIPFORMER_ENABLE_REPACK=1 \
ZIPFORMER_GGML_THREADS=8 \
OMP_NUM_THREADS=8 \
build-host-zipformer/zipformer-cli \
/path/to/zipformer.gguf \
/path/to/audio.wav \
/path/to/tokens.txtInput audio should be 16 kHz mono PCM WAV.
The RISC-V target is the board at:
192.168.1.7
The current cross route uses GCC 14 and RVV:
docker run --rm \
-v /home/yongsheng/repos/Kodachi:/workspace \
-w /workspace \
kodachi-riscv64-gcc14-cross:latest \
bash -lc 'cmake -S . -B build-riscv64-zipformer-rvv-streaming -G Ninja \
-DCMAKE_TOOLCHAIN_FILE=/workspace/cmake/toolchains/riscv64-linux-gnu-gcc14.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DQWEN3_ASR_NATIVE_ARCH=OFF \
-DQWEN3_ASR_TIMING=ON \
-DQWEN3_ASR_GGML_DIR=/workspace/ggml \
-DGGML_OPENMP=OFF \
-DGGML_RVV=ON \
-DGGML_RV_ZFH=ON \
-DGGML_RV_ZVFH=ON \
-DGGML_RV_ZICBOP=OFF \
-DGGML_RV_ZIHINTPAUSE=ON &&
cmake --build build-riscv64-zipformer-rvv-streaming \
--target zipformer-streaming-server -j8'The important runtime environment on the board is:
ZIPFORMER_ENABLE_REPACK=1
ZIPFORMER_GGML_THREADS=8
OMP_NUM_THREADS=8
Recommended remote directory:
/tmp/kodachi-rvv-repack
Expected files:
zipformer-streaming-server
zipformer-mixed-sel-encoder-all-q4_0.gguf
tokens.txt
Install the systemd unit:
scp packaging/zipformer-streaming.service 192.168.1.7:/tmp/zipformer-streaming.service
ssh 192.168.1.7 \
'sudo mv /tmp/zipformer-streaming.service /etc/systemd/system/zipformer-streaming.service &&
sudo systemctl daemon-reload &&
sudo systemctl enable --now zipformer-streaming.service'Operate the service:
ssh 192.168.1.7 'sudo systemctl start zipformer-streaming.service'
ssh 192.168.1.7 'sudo systemctl stop zipformer-streaming.service'
ssh 192.168.1.7 'systemctl --no-pager --full status zipformer-streaming.service'
ssh 192.168.1.7 'journalctl -u zipformer-streaming.service -n 100 --no-pager'Health check:
curl http://192.168.1.7:19090/healthExpected response:
{"status":"ok"}Full service details are in zipformer_streaming_server.md.
The service exposes:
ws://192.168.1.7:19090/soniox
ws://192.168.1.7:19090/bailian
Both endpoints accept PCM16LE audio in websocket binary frames and emit partial hypotheses before the final result.
The verified final transcript on the smoke-test audio was:
MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL
Recommended ZipFormer quantization policy for the board:
encoder 2D weights: Q4_0
decoder 2D weights: Q8_0
joiner 2D weights: Q8_0
biases / small tensors: unchanged or F32
The current GGML RISC-V repack path is useful for Q4_0. Q4_K and Q8_0 do not currently receive the same RISC-V CPU_REPACK treatment in this tree.
- implementation_report_2026-04-13.md: detailed Qwen3-ASR and ZipFormer implementation report, benchmark history, and conclusions.
- zipformer_streaming_server.md: websocket protocol, RISC-V installation, systemd service, and operations.
- Use
/tmpor ignoredmodels/directories for local model artifacts. - Do not commit GGUF, ONNX, WAV, or build outputs.
ggml/is vendored source, not a submodule.third_party/nlohmann/is kept as the only standalone third-party header dependency outside GGML.