Kodachi

Kodachi is a C++ / GGML ASR research repository for CPU-first speech recognition, with RISC-V as the main deployment target.

The current practical path is ZipFormer streaming ASR:

Chinese-English ZipFormer RNNT runtime in C++ and GGML
x86 host bring-up and profiling
RISC-V GCC 14 cross compilation
RVV 1.0 + GGML Q4_0 repack experiments
selective quantization for the ZipFormer encoder
local websocket service compatible with Shinsoku-style Soniox and Bailian streaming backends
systemd deployment on the RISC-V board

The repository also keeps the earlier Qwen3-ASR / ForcedAligner CPU work as a research baseline. That path includes K-Quant, TurboQuant, rotated tiled experiments, and RVV bring-up, but it is not the recommended real-time service target on the current board.

Current Conclusion

Qwen3-ASR 0.6B is too large for the current RISC-V board if the goal is real-time local ASR. The best measured GCC 14 + RVV Qwen path was still around RTF_wall 4.57 on a short sample.

ZipFormer is the active deployment path. The current selective-quant ZipFormer model keeps decoder and joiner in Q8_0 while quantizing encoder 2D weights to Q4_0. On the RISC-V board this produced correct output on the tested sample at about:

wall RTF:    0.4802
encoder RTF: 0.3483
peak RSS:    376908 KB

See implementation_report_2026-04-13.md for the full implementation and benchmark history.

Repository Layout

src/
  qwen3_asr.*                 Qwen3-ASR orchestration
  forced_aligner.*            Qwen3 forced aligner
  audio_encoder.*             Qwen3 audio encoder
  text_decoder.*              Qwen text decoder
  turboquant_runtime.*        TurboQuant runtime experiment
  zipformer_*.{h,cpp}         ZipFormer GGML runtime

tools/
  zipformer_cli.cpp           Offline ZipFormer CLI
  zipformer_streaming_server.cpp
                                Websocket streaming ASR service
  zipformer_onnx_to_gguf.py   ZipFormer ONNX to GGUF converter
  requantize_gguf.cpp         GGUF quantization helper

scripts/
  build_riscv64_gcc14_cross.sh
  selective_zipformer_quant.py
  benchmark_*.py
  turboquant_*.py

docs/
  implementation_report_2026-04-13.md
  zipformer_streaming_server.md

packaging/
  zipformer-streaming.service

ggml/
  Vendored GGML source

third_party/nlohmann/
  Header-only JSON dependency for the websocket server

Large artifacts are intentionally not tracked:

GGUF models
ONNX exports
WAV/audio corpora
build directories
benchmark dumps
temporary converted models

Build On x86

Configure and build the ZipFormer CLI and streaming server:

cmake -S . -B build-host-zipformer -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DQWEN3_ASR_NATIVE_ARCH=ON \
  -DQWEN3_ASR_TIMING=ON \
  -DGGML_CCACHE=OFF

cmake --build build-host-zipformer \
  --target zipformer-cli zipformer-streaming-server test_streaming_fbank \
  -j$(nproc)

Run the incremental fbank test:

build-host-zipformer/test_streaming_fbank

Run ZipFormer offline:

env ZIPFORMER_ENABLE_REPACK=1 \
    ZIPFORMER_GGML_THREADS=8 \
    OMP_NUM_THREADS=8 \
    build-host-zipformer/zipformer-cli \
      /path/to/zipformer.gguf \
      /path/to/audio.wav \
      /path/to/tokens.txt

Input audio should be 16 kHz mono PCM WAV.

RISC-V Cross Build

The RISC-V target is the board at:

192.168.1.7

The current cross route uses GCC 14 and RVV:

docker run --rm \
  -v /home/yongsheng/repos/Kodachi:/workspace \
  -w /workspace \
  kodachi-riscv64-gcc14-cross:latest \
  bash -lc 'cmake -S . -B build-riscv64-zipformer-rvv-streaming -G Ninja \
    -DCMAKE_TOOLCHAIN_FILE=/workspace/cmake/toolchains/riscv64-linux-gnu-gcc14.cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DQWEN3_ASR_NATIVE_ARCH=OFF \
    -DQWEN3_ASR_TIMING=ON \
    -DQWEN3_ASR_GGML_DIR=/workspace/ggml \
    -DGGML_OPENMP=OFF \
    -DGGML_RVV=ON \
    -DGGML_RV_ZFH=ON \
    -DGGML_RV_ZVFH=ON \
    -DGGML_RV_ZICBOP=OFF \
    -DGGML_RV_ZIHINTPAUSE=ON &&
    cmake --build build-riscv64-zipformer-rvv-streaming \
      --target zipformer-streaming-server -j8'

The important runtime environment on the board is:

ZIPFORMER_ENABLE_REPACK=1
ZIPFORMER_GGML_THREADS=8
OMP_NUM_THREADS=8

RISC-V Service Deployment

Recommended remote directory:

/tmp/kodachi-rvv-repack

Expected files:

zipformer-streaming-server
zipformer-mixed-sel-encoder-all-q4_0.gguf
tokens.txt

Install the systemd unit:

scp packaging/zipformer-streaming.service 192.168.1.7:/tmp/zipformer-streaming.service
ssh 192.168.1.7 \
  'sudo mv /tmp/zipformer-streaming.service /etc/systemd/system/zipformer-streaming.service &&
   sudo systemctl daemon-reload &&
   sudo systemctl enable --now zipformer-streaming.service'

Operate the service:

ssh 192.168.1.7 'sudo systemctl start zipformer-streaming.service'
ssh 192.168.1.7 'sudo systemctl stop zipformer-streaming.service'
ssh 192.168.1.7 'systemctl --no-pager --full status zipformer-streaming.service'
ssh 192.168.1.7 'journalctl -u zipformer-streaming.service -n 100 --no-pager'

Health check:

curl http://192.168.1.7:19090/health

Expected response:

{"status":"ok"}

Full service details are in zipformer_streaming_server.md.

Websocket Endpoints

The service exposes:

ws://192.168.1.7:19090/soniox
ws://192.168.1.7:19090/bailian

Both endpoints accept PCM16LE audio in websocket binary frames and emit partial hypotheses before the final result.

The verified final transcript on the smoke-test audio was:

MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL

Model Policy

Recommended ZipFormer quantization policy for the board:

encoder 2D weights: Q4_0
decoder 2D weights: Q8_0
joiner 2D weights: Q8_0
biases / small tensors: unchanged or F32

The current GGML RISC-V repack path is useful for Q4_0. Q4_K and Q8_0 do not currently receive the same RISC-V CPU_REPACK treatment in this tree.

Documentation

implementation_report_2026-04-13.md: detailed Qwen3-ASR and ZipFormer implementation report, benchmark history, and conclusions.
zipformer_streaming_server.md: websocket protocol, RISC-V installation, systemd service, and operations.

Notes

Use /tmp or ignored models/ directories for local model artifacts.
Do not commit GGUF, ONNX, WAV, or build outputs.
ggml/ is vendored source, not a submodule.
third_party/nlohmann/ is kept as the only standalone third-party header dependency outside GGML.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
cmake/toolchains		cmake/toolchains
configs		configs
docker		docker
docs		docs
ggml		ggml
packaging		packaging
scripts		scripts
src		src
tests		tests
third_party/nlohmann		third_party/nlohmann
tools		tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kodachi

Current Conclusion

Repository Layout

Build On x86

RISC-V Cross Build

RISC-V Service Deployment

Websocket Endpoints

Model Policy

Documentation

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kodachi

Current Conclusion

Repository Layout

Build On x86

RISC-V Cross Build

RISC-V Service Deployment

Websocket Endpoints

Model Policy

Documentation

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages