Skip to content

chux0519/Kodachi

Repository files navigation

Kodachi

Kodachi is a C++ / GGML ASR research repository for CPU-first speech recognition, with RISC-V as the main deployment target.

The current practical path is ZipFormer streaming ASR:

  • Chinese-English ZipFormer RNNT runtime in C++ and GGML
  • x86 host bring-up and profiling
  • RISC-V GCC 14 cross compilation
  • RVV 1.0 + GGML Q4_0 repack experiments
  • selective quantization for the ZipFormer encoder
  • local websocket service compatible with Shinsoku-style Soniox and Bailian streaming backends
  • systemd deployment on the RISC-V board

The repository also keeps the earlier Qwen3-ASR / ForcedAligner CPU work as a research baseline. That path includes K-Quant, TurboQuant, rotated tiled experiments, and RVV bring-up, but it is not the recommended real-time service target on the current board.

Current Conclusion

Qwen3-ASR 0.6B is too large for the current RISC-V board if the goal is real-time local ASR. The best measured GCC 14 + RVV Qwen path was still around RTF_wall 4.57 on a short sample.

ZipFormer is the active deployment path. The current selective-quant ZipFormer model keeps decoder and joiner in Q8_0 while quantizing encoder 2D weights to Q4_0. On the RISC-V board this produced correct output on the tested sample at about:

wall RTF:    0.4802
encoder RTF: 0.3483
peak RSS:    376908 KB

See implementation_report_2026-04-13.md for the full implementation and benchmark history.

Repository Layout

src/
  qwen3_asr.*                 Qwen3-ASR orchestration
  forced_aligner.*            Qwen3 forced aligner
  audio_encoder.*             Qwen3 audio encoder
  text_decoder.*              Qwen text decoder
  turboquant_runtime.*        TurboQuant runtime experiment
  zipformer_*.{h,cpp}         ZipFormer GGML runtime

tools/
  zipformer_cli.cpp           Offline ZipFormer CLI
  zipformer_streaming_server.cpp
                                Websocket streaming ASR service
  zipformer_onnx_to_gguf.py   ZipFormer ONNX to GGUF converter
  requantize_gguf.cpp         GGUF quantization helper

scripts/
  build_riscv64_gcc14_cross.sh
  selective_zipformer_quant.py
  benchmark_*.py
  turboquant_*.py

docs/
  implementation_report_2026-04-13.md
  zipformer_streaming_server.md

packaging/
  zipformer-streaming.service

ggml/
  Vendored GGML source

third_party/nlohmann/
  Header-only JSON dependency for the websocket server

Large artifacts are intentionally not tracked:

  • GGUF models
  • ONNX exports
  • WAV/audio corpora
  • build directories
  • benchmark dumps
  • temporary converted models

Build On x86

Configure and build the ZipFormer CLI and streaming server:

cmake -S . -B build-host-zipformer -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DQWEN3_ASR_NATIVE_ARCH=ON \
  -DQWEN3_ASR_TIMING=ON \
  -DGGML_CCACHE=OFF

cmake --build build-host-zipformer \
  --target zipformer-cli zipformer-streaming-server test_streaming_fbank \
  -j$(nproc)

Run the incremental fbank test:

build-host-zipformer/test_streaming_fbank

Run ZipFormer offline:

env ZIPFORMER_ENABLE_REPACK=1 \
    ZIPFORMER_GGML_THREADS=8 \
    OMP_NUM_THREADS=8 \
    build-host-zipformer/zipformer-cli \
      /path/to/zipformer.gguf \
      /path/to/audio.wav \
      /path/to/tokens.txt

Input audio should be 16 kHz mono PCM WAV.

RISC-V Cross Build

The RISC-V target is the board at:

192.168.1.7

The current cross route uses GCC 14 and RVV:

docker run --rm \
  -v /home/yongsheng/repos/Kodachi:/workspace \
  -w /workspace \
  kodachi-riscv64-gcc14-cross:latest \
  bash -lc 'cmake -S . -B build-riscv64-zipformer-rvv-streaming -G Ninja \
    -DCMAKE_TOOLCHAIN_FILE=/workspace/cmake/toolchains/riscv64-linux-gnu-gcc14.cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DQWEN3_ASR_NATIVE_ARCH=OFF \
    -DQWEN3_ASR_TIMING=ON \
    -DQWEN3_ASR_GGML_DIR=/workspace/ggml \
    -DGGML_OPENMP=OFF \
    -DGGML_RVV=ON \
    -DGGML_RV_ZFH=ON \
    -DGGML_RV_ZVFH=ON \
    -DGGML_RV_ZICBOP=OFF \
    -DGGML_RV_ZIHINTPAUSE=ON &&
    cmake --build build-riscv64-zipformer-rvv-streaming \
      --target zipformer-streaming-server -j8'

The important runtime environment on the board is:

ZIPFORMER_ENABLE_REPACK=1
ZIPFORMER_GGML_THREADS=8
OMP_NUM_THREADS=8

RISC-V Service Deployment

Recommended remote directory:

/tmp/kodachi-rvv-repack

Expected files:

zipformer-streaming-server
zipformer-mixed-sel-encoder-all-q4_0.gguf
tokens.txt

Install the systemd unit:

scp packaging/zipformer-streaming.service 192.168.1.7:/tmp/zipformer-streaming.service
ssh 192.168.1.7 \
  'sudo mv /tmp/zipformer-streaming.service /etc/systemd/system/zipformer-streaming.service &&
   sudo systemctl daemon-reload &&
   sudo systemctl enable --now zipformer-streaming.service'

Operate the service:

ssh 192.168.1.7 'sudo systemctl start zipformer-streaming.service'
ssh 192.168.1.7 'sudo systemctl stop zipformer-streaming.service'
ssh 192.168.1.7 'systemctl --no-pager --full status zipformer-streaming.service'
ssh 192.168.1.7 'journalctl -u zipformer-streaming.service -n 100 --no-pager'

Health check:

curl http://192.168.1.7:19090/health

Expected response:

{"status":"ok"}

Full service details are in zipformer_streaming_server.md.

Websocket Endpoints

The service exposes:

ws://192.168.1.7:19090/soniox
ws://192.168.1.7:19090/bailian

Both endpoints accept PCM16LE audio in websocket binary frames and emit partial hypotheses before the final result.

The verified final transcript on the smoke-test audio was:

MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL

Model Policy

Recommended ZipFormer quantization policy for the board:

encoder 2D weights: Q4_0
decoder 2D weights: Q8_0
joiner 2D weights: Q8_0
biases / small tensors: unchanged or F32

The current GGML RISC-V repack path is useful for Q4_0. Q4_K and Q8_0 do not currently receive the same RISC-V CPU_REPACK treatment in this tree.

Documentation

Notes

  • Use /tmp or ignored models/ directories for local model artifacts.
  • Do not commit GGUF, ONNX, WAV, or build outputs.
  • ggml/ is vendored source, not a submodule.
  • third_party/nlohmann/ is kept as the only standalone third-party header dependency outside GGML.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors