Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wake word support #5

Closed
edurenye opened this issue Sep 6, 2023 · 24 comments
Closed

Add wake word support #5

edurenye opened this issue Sep 6, 2023 · 24 comments
Assignees
Labels
enhancement New feature or request

Comments

@edurenye
Copy link

edurenye commented Sep 6, 2023

Now that core has added the wake word integration (home-assistant/core#96380) this component could make use of it after running the Voice Activity Detector.

It could be integrated using wyoming protocol, since that core integration also provides a Wyoming implementation that can be used with an openWakeWord container.

@starsoccer
Copy link

+1 on this @AlexxIT any plans to add support for this? Seems like the missing piece

@balloob
Copy link

balloob commented Oct 13, 2023

Without having looked at the code, you can probably test it out by changing this line locally to PipelineStage.WAKE_WORD:

start_stage=PipelineStage.STT,

@synesthesiam
Copy link

See the updated Assist docs here for wake word detection: https://developers.home-assistant.io/docs/voice/pipelines/#wake-word-detection

You will also have the option of passing in parameters for audio enhancement, so HA can clean up noise and boost the volume if needed.

@starsoccer
Copy link

So I made a fork here, https://github.com/starsoccer/StreamAssist, and tried to get this working but did not have much luck. I will prefix all of the following by saying while I am a developer I dont really know python but anyways theory/details below.

So I made the change mentioned by @balloob to use WAKE_WORD but it seems the parameters for vad.process has changed in homeassistant. Previously the code was just sending a chunk of audio, but now it expects a second parameter, is_speech which seems to be a boolean. I tried setting it to both true/false but neither seems to work and both end up generating the below error which I am a bit lost at

2023-10-19 10:22:53.907 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File "/config/custom_components/stream_assist/switch.py", line 152, in async_process_audio_stream
async for _ in self.audio_stream(self.close):
File "/config/custom_components/stream_assist/switch.py", line 110, in audio_stream
if not self.vad.process(chunk, False):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/vad.py", line 155, in process
self._timeout_seconds_left -= chunk_seconds
TypeError: unsupported operand type(s) for -=: 'float' and 'bytes'

@synesthesiam As long as your on this thread, are there any plans to support something like this natively in HA? I think a lot of people likely have cameras/microphones and just having a simple UI that lets you make an Assistant Satellite which consists of a input streaam(ideally microphone, or video stream), and then a media player to play voice back to

@tbrasser
Copy link

I guess something like these changes need to be done: home-assistant/core@7856189#diff-4ff817d7964242e3c079f2f2799985713b8a4983de705b6fcf620542fe5897ff

@starsoccer
Copy link

I guess something like these changes need to be done: home-assistant/core@7856189#diff-4ff817d7964242e3c079f2f2799985713b8a4983de705b6fcf620542fe5897ff

Oh good find, yeah that shows it was previously taking a chunk of aaudio but now it wants a float for the chunk seconds. Honestly I am not sure if this is even the right function to use anymore. I dont really understand the order things should happen in but my thinking is that maybe I should instead be calling process_with_vad and then I can continue to pass in the audio chunk and simply need to figure out how to create a `VoiceActivityDetector

Whats not really clear to me is if I continue to use this process function and instead pass in the chunk time, how its going to actually get the audio chunk as I dont see it being passed in anywhere else. Even in the example test files it seems to just call it without any audio which I dont really get, https://github.com/home-assistant/core/blob/22c21fdc180fec24e3a45e038aba6fb685acd776/tests/components/assist_pipeline/test_vad.py#L33C48-L33C48

@synesthesiam
Copy link

@starsoccer The process function is now used when an external VAD has already been used. This was done to avoid running VAD twice for the same audio chunk in a pipeline with wake and STT.

Let me know if you have any more questions, since I wrote the code 😄

@starsoccer
Copy link

@starsoccer The process function is now used when an external VAD has already been used. This was done to avoid running VAD twice for the same audio chunk in a pipeline with wake and STT.

Let me know if you have any more questions, since I wrote the code 😄

Got it, do you have an example of how the audio is passed to the function that I can maybe work from?

Also any plans to build this functionality into HA directly rather then need this custom integration?

@AlexxIT AlexxIT added the enhancement New feature or request label Oct 24, 2023
@TheRealSimon42
Copy link

Any updates on this so far? It would be REALLY great to have a possibility to get Wall-Panels which already have an internal mic to work as an Assist-Device with Wake-Word 😍

@starsoccer
Copy link

+1 would love to keep working on this but its a bit outside my skill set honestly. Hopefully HA adds/builds this feature in natively letting users specify a input device either video or microphone, and then allowing any speaker to function as an output

@tbrasser
Copy link

Asked about this in the year of the voice chapter 5 live chat but no response (or I missed it). I was thinking this could also be integrated into frigate maybe? Seeing there's vad going on in frigate 0.13

@starsoccer
Copy link

Yeah I made an issue about putting it into frigate but was told this isnt planned, blakeblackshear/frigate#8644

Hopefully someone more technical then me will get this working. Ive asked in the discord and seen someone else ask about it, but so far I dont know anyone who has this working.

@tbrasser
Copy link

Seems this can be a starting point! https://github.com/asmsaifs/StreamAssist

@starsoccer
Copy link

Seems this can be a starting point! https://github.com/asmsaifs/StreamAssist

Cool, I tried using it but getting this vague error,
User input malformed: two or more values in the same group of exclusion 'url' @ data[<url>]

Not sure if I am missing something or not. But any value I seem to put in the url for the stream seems to give this error

@AlexxIT AlexxIT self-assigned this Jan 26, 2024
@AlexxIT
Copy link
Owner

AlexxIT commented Feb 12, 2024

In testing in the latest main/master version

@starsoccer
Copy link

@AlexxIT Do you have any info on how we can use the new version and test/debug it? I gave it a try but cant seem to get anything to happen. Ive tried using a RTSP url as well as a camera entity. They both seem to just get stuck in the start phase for wake and then never change.

@AlexxIT
Copy link
Owner

AlexxIT commented Feb 12, 2024

Reinstall via HACS with manual selecting main version tag

@starsoccer
Copy link

Reinstall via HACS with manual selecting main version tag

I already did that

@tbrasser
Copy link

It works flawlessly! Awesome! My tablets around the house just got superpowers!

@AlexxIT
Copy link
Owner

AlexxIT commented Feb 16, 2024

Thanks. Unfortunately I don't have time to do a complete test. Also, it all works just horribly in my language.
So hopefully it's working fine for you. Let me know if you have any problems.

https://github.com/AlexxIT/StreamAssist/releases/tag/v2.0.0

@AlexxIT AlexxIT closed this as completed Feb 16, 2024
@starsoccer
Copy link

Can you add some more troubleshooting info? For instance how to ensure its detecting voice and it actually gets past the wakeword stage/step as that is where mine seems to be stuck

@AlexxIT
Copy link
Owner

AlexxIT commented Feb 17, 2024

You can enable debug logs

@tbrasser
Copy link

This should get a shout out on year of the voice part 6!

I'm using this on Amazon Fire HD 10+ tablets running IP Webcam for rtsp stream, and Fully Kiosk for the media_player. And it works out of the box! (Mostly using Extended OpenAI Conversation agent, while not local the results are so impressive, having to work with intents seems puny)

Small FR/Q: Should these show up as assist devices?

@AlexxIT
Copy link
Owner

AlexxIT commented Feb 17, 2024

I have never saw what is assist device.
I have ordered M5Stack Official ATOM Echo Smart Speaker to check how default pipeline works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants