Skip to content

feat: wake word detection with openWakeWord#18

Merged
0xharkirat merged 14 commits into
mainfrom
feat/wake-word
Apr 11, 2026
Merged

feat: wake word detection with openWakeWord#18
0xharkirat merged 14 commits into
mainfrom
feat/wake-word

Conversation

@0xharkirat
Copy link
Copy Markdown
Contributor

Summary

  • "Hey Hark" wake word detection using openWakeWord (Apache 2.0) + ONNX Runtime
  • Custom trained model (201KB) for "Hey Hark" phrase via openWakeWord Colab notebook
  • Full Pigeon integration: startWakeWordService/stopWakeWordService/isWakeWordRunning/setWakeWordPaused
  • Auto-starts mic on detection, pauses during STT, resumes after
  • Works even when app is backgrounded (engine persists in HarkApplication)

How it works

AudioRecord (16kHz) -> openWakeWord Engine -> "Hey Hark" detected
    -> HarkResultFlutterApi.onWakeWordDetected()
        -> ChatNotifier.onMicPressed() (auto-start STT)

Mutual exclusion with STT: wake word engine fully stops during STT (AudioRecord conflict on Moto G56), restarts after command completes.

Key files

File Purpose
WakeWordDetector.kt Wraps openWakeWord engine with start/stop/pause/resume
HarkPlatformPlugin.kt Implements Pigeon stubs, wires detector to FlutterApi
messages.dart Added wake word APIs to HarkCommonApi + HarkResultFlutterApi
oacp_result_service.dart Exposes wakeWordDetections stream
chat_notifier.dart Subscribes to wake word stream, auto-mic, pause/resume around STT
hey_harkh.onnx Custom trained wake word model (201KB)
wake-word-phases.md Phased plan: Phase 1 (in-app), Phase 2 (background service), Phase 3 (continuous session + AEC)

Dependencies added

  • xyz.rementia:openwakeword:0.1.4 (Maven Central, Apache 2.0)
  • com.github.gkonovalov.android-vad:silero:2.0.10 (JitPack, MIT)
  • JitPack repo added to both plugin and app build.gradle

Known limitations

  • ~25 second buffer rebuild after STT (engine restart clears audio context)
  • Auto-starts on main engine only (no background foreground service yet)
  • No settings screen / toggle (auto-starts after init)
  • Green mic indicator when active (Android privacy feature, unavoidable)

Test plan

  • App loads, wake word auto-starts after splash
  • Say "Hey Hark" -> mic auto-starts listening
  • Say a command ("turn on the flashlight") -> dispatches correctly
  • After command, say "Hey Hark" again -> re-triggers (with buffer rebuild delay)
  • Normal long-press home overlay still works alongside wake word

🤖 Generated with Claude Code

0xharkirat and others added 14 commits April 11, 2026 15:27
Wire end-to-end wake word detection pipeline:

Kotlin side:
- WakeWordDetector.kt: wraps openWakeWord's WakeWordEngine with
  start/stop/pause/resume lifecycle
- HarkPlatformPlugin: implements startWakeWordService/stopWakeWordService/
  isWakeWordRunning/setWakeWordPaused, fires onWakeWordDetected callback

Pigeon schema:
- Added startWakeWordService, stopWakeWordService, isWakeWordRunning,
  setWakeWordPaused to HarkCommonApi
- Added onWakeWordDetected to HarkResultFlutterApi

Dart side:
- OacpResultService: exposes wakeWordDetections stream
- ChatNotifier: subscribes to wake word stream, auto-starts mic on
  detection, pauses wake word during STT, resumes after

Dependencies:
- xyz.rementia:openwakeword:0.1.4 (Apache 2.0)
- com.github.gkonovalov.android-vad:silero:2.0.10 (MIT)
- ONNX Runtime forced to 1.23.0 across all deps

Assets:
- wakeword/hello_world.onnx (placeholder until hey_hark.onnx is trained)
- wakeword/melspectrogram.onnx + embedding_model.onnx (preprocessing)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
android-vad (Silero VAD wrapper) is hosted on JitPack, not Maven Central.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
android-vad is on JitPack. The plugin's allprojects block doesn't
propagate to the app module's dependency resolution, so JitPack must
be added to the app's android/build.gradle.kts as well.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Start wake word detection automatically once ChatNotifier finishes
initialization. This is temporary until a proper settings screen
with toggle is added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
openWakeWord's MelSpectrogram and EmbeddingModel classes hardcode
asset paths to root-level melspectrogram.onnx and embedding_model.onnx.
Moved preprocessing models to assets root, kept wake word model in
wakeword/ subfolder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wake word was only resumed in cancelListening() but not in the STT
onDone callback path. When STT timed out naturally, wake word stayed
paused permanently. Now resumes in both paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Android only allows one AudioRecord at a time. The previous pause()
just suppressed callbacks but kept the engine's AudioRecord open,
preventing STT from acquiring the mic. Now pause() fully stops the
engine and releases the mic, resume() restarts it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detection scores range 0.515-0.726 for 'hello world'. Lowering
threshold from 0.5 to 0.3 for faster detection. Reducing cooldown
from 3s to 1.5s for quicker re-trigger after commands.

The main delay is the feature buffer rebuild after engine restart
(~10s of audio needed). This is inherent to the openWakeWord
architecture and will improve with the trained hey_hark model.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…only

Instead of full stop/restart on pause/resume (which loses the 10-second
audio buffer and causes detection delay), keep the engine and AudioRecord
running continuously. Only suppress detection callbacks during STT.

Android's SpeechRecognizer manages its own audio session and can coexist
with AudioRecord on most devices. This eliminates the buffer rebuild
delay, making wake word detection resume instantly after commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document three phases of wake word implementation:
- Phase 1 (current): in-app detection with openWakeWord
- Phase 2: background foreground service with VAD gating
- Phase 3: continuous listening session with echo cancellation,
  barge-in, and no repeated wake word needed

Phase 3 research covers AcousticEchoCanceler, VOICE_COMMUNICATION
audio source, WebRTC AEC, and ChatGPT voice mode observations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hello_world.onnx placeholder with custom-trained "Hey Hark"
model (hey_harkh.onnx, 201KB). Trained via openWakeWord Colab notebook
with phonetic spelling "hey harkh" for better pronunciation matching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- AGENTS.md: added WakeWordDetector.kt to file tree, wake word assets,
  openWakeWord + android-vad to deps, updated API table
- README.md: added Wake Word section, updated roadmap bullets
- ROADMAP.md: wake word status changed to in-progress with 7 shipped
  and 6 remaining sub-items

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Removed hello_world.onnx (202KB) since hey_harkh.onnx is shipped
- Added TODO for silent failure on null context in startWakeWordService
- Added TODO for isRunning returning false during pause
- Added TODO for wake word resuming before TTS completes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant