Skip to content

v0.1.0

Latest

Choose a tag to compare

@make1986 make1986 released this 04 Jun 11:08

First public release of the VoxRT wake-word model — detects the phrase "Hey Assistant" on the VoxRT custom on-device inference runtime.

Quality

Held-out test split: 5,240 positive utterances + 6,416 hard-negative utterances
(isolated "Hey", isolated "Assistant", competitor wake-words like "Hey Siri",
phonetic neighbours, arbitrary speech, non-speech audio). Speakers disjoint
from train + val.

  • ROC AUC: 0.9966
  • Average precision (PR AUC): 0.9899

At the recommended deploy threshold of 0.9:

Metric Test value
Precision 0.993
Recall 0.982
F1 0.987
FPR 0.5 %

Architecture + footprint

  • 8-block depthwise-separable Conv1D, dilations [1, 2, 4, 4, 4, 2, 2, 1], 64 channels
  • ~48 K parameters, fp16 weights, AES-256-GCM at-rest encryption
  • ~100 KB on disk (the .vxrt artefact below)
  • 64-bin Slaney-norm mel frontend, 16 kHz mono PCM input, 200-frame (2 s) sliding window

Runtime performance

arm64-v8a release builds, post-warmup, RTF = wall-time-per-frame ÷ frame audio duration:

Device SoC RTF
Xiaomi Redmi 9C (Cortex-A73 pin) SD 662 (midrange 2020) 0.021
iPhone 13 Pro Max Apple A15 Bionic 0.015

≈ 50–65× faster than realtime on phone-class hardware — well within an always-on power budget.

Install

Pair with one of the consumer libraries:

Or download directly + verify:

curl -L -o voxrt_wake_word.vxrt \
  https://github.com/VoxRT/voxrt-wake-word-models/releases/download/v0.1.0/voxrt_wake_word.vxrt
echo "9d40bdc132a2ad8e85bd8a28bb49b77c51a7c62f60567222a037e44418510e8f  voxrt_wake_word.vxrt" | shasum -a 256 -c

License

VoxRT proprietary — redistribution as part of the unmodified voxrt-wake-word-{android,ios} libraries is permitted for commercial apps without
per-installation fees. See LICENSE for full terms. Custom phrases / multi-phrase detection / language extension via the commercial VoxRT SDK — contact
help@voxrt.com.