Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delay until STT start media finishes playing #11

Open
relust opened this issue Feb 20, 2024 · 14 comments
Open

Add delay until STT start media finishes playing #11

relust opened this issue Feb 20, 2024 · 14 comments
Labels
enhancement New feature or request

Comments

@relust
Copy link

relust commented Feb 20, 2024

Hello. Great job. I was waiting for the wake word for Stream Assist and I'm glad you managed to do it. My problem is that for "STT start media" I want to use personalized random answers like ”yes, i m listening”, ”how can I assist you” etc. and, because VAD is too aggressive, it also records part of the answer ”yes , i m listening” reason for which it gives an error response, that it did not understand the request. I tried an automation so that when it detects the wake word it turns off the microphone switch for a second and then turns it on again, but it doesn't start listening again. Can you make it possible to set a delay between wake word detection and STT listening?

@AlexxIT AlexxIT added the question Further information is requested label Feb 20, 2024
@AlexxIT
Copy link
Owner

AlexxIT commented Feb 20, 2024

StreamAssist uses default Assist Pipeline component. It has some settings, but I don't really understand them :)
https://github.com/home-assistant/core/blob/54d005a3b8a5beaaf912a37b89ceab78694bd9db/homeassistant/components/assist_pipeline/pipeline.py#L447-L457

Also realise that the player has finished playing for all kinds of media player can be a problem.

@relust
Copy link
Author

relust commented Feb 20, 2024

Assist Microphone addon and wyoming satellite on raspberry py do not have this problem. Wait for awake response to finish playing then start listening. So there is something like that in the code, but we have to figure out where. And on the satellite on the Esp32 it has three levels of end-of-speech detection (Default, Relaxed and Aggressive).

@AlexxIT
Copy link
Owner

AlexxIT commented Feb 20, 2024

@AlexxIT
Copy link
Owner

AlexxIT commented Feb 20, 2024

I get the idea. I don't know if I'll have time to implement this.

@AlexxIT AlexxIT added enhancement New feature or request and removed question Further information is requested labels Feb 20, 2024
@relust
Copy link
Author

relust commented Feb 22, 2024

I found a possible solution to this problem:

  • in stream_assist/core/_init_.py set the wake sound playback to WAKE_WORD_END instead of STT_START because if I want it to start a continuous conversation I don't want it to say the same thing as when I call by name.
  • to create an asynchronous task by which to pause 0.1 seconds after giving the command to play awake sound with await asyncio.sleep(0.1) then to put a blocking pause, to stop all the code until awake sound finishes playing, with time.sleep()
  • The problem that still needs to be solved is that VAD is too aggressive, we should find a way to tell Home Assistant to use Relaxed VAD. I think this can be done where the settings are sent to the Pipeline.
# 2. Setup Pipeline Run
#...
        if event.type == PipelineEventType.WAKE_WORD_END:
            if player_entity_id and (media_id := data.get("stt_start_media")):
                # We schedule the execution of the asynchronous function in the background
                asyncio.create_task(async_play_media_and_pause(hass, player_entity_id, media_id))
#... at the bottom of the script
async def async_play_media_and_pause(hass, player_entity_id, media_id):
    play_media(hass, player_entity_id, media_id, "audio")
    await asyncio.sleep(0.1)  # We add an asynchronous pause of 100 ms
    time.sleep(5)  # We add a blocking pause that can be adjusted to how long the awake sentence is

@AlexxIT
Copy link
Owner

AlexxIT commented Feb 22, 2024

Block loop is very bad idea. You are blocking whole Hass.

I know what can be done. I can stop forwarding audio stream from source to pipeline for some time

@relust
Copy link
Author

relust commented Feb 22, 2024

I didn't think that it blocks whole Hass. Anyway, it doesn't really work because, I don't know why it starts recording as soon as the wake word is detected, then blocks and delays the VAD and doesn't recognize the commands. Stopping audio stream forwarding would be a much better solution.

@relust
Copy link
Author

relust commented Feb 26, 2024

@AlexxIT please can you find a solution to this problem because I want to add visual responses instead of beeps in this integration and if I don't solve the problem with activate mute or delay listening I can't use such responses because it records them and no longer recognize commands.

@AlexxIT
Copy link
Owner

AlexxIT commented Feb 27, 2024

I don't have time for this in near future

@relust
Copy link
Author

relust commented Mar 5, 2024

I added a browser mod popup with a gif and I need the player status to close the popup when the response finishes playing , but I'm not getting the "player_entity_id" from the args. @AlexxIT can you tell me how I could do it.

        elif event.type == PipelineEventType.TTS_END:
            if player_entity_id:
                tts = event.data["tts_output"]
                play_media(hass, player_entity_id, tts["url"], tts["mime_type"])
            if player_entity_id and (media_id := data.get("speech_gif")):
                show_popup(hass, player_entity_id, media_id, "picture", browser_id)
            if player_entity_id:
                asyncio.create_task(async_delay_close_popup(hass, player_entity_id, browser_id))
                
   
######################################################              
  
   async def async_delay_close_popup(hass, player_entity_id, browser_id):
    
    await asyncio.sleep(1)

    while True:
        player_state = hass.states.get(player_entity_id).state
        if player_state == "idle":
            break 

        await asyncio.sleep(0.1)

    close_popup(hass, player_entity_id, browser_id)
    
##################################################   
    def close_popup(hass: HomeAssistant, player_entity_id: str, browser_id: str):
    service_data = {        
        "entity_id": player_entity_id,
        "browser_id": browser_id,
    }

    coro = hass.services.async_call("browser_mod", "close_popup", service_data)
    hass.async_create_background_task(coro, "stream_assist_close_popup")

If I use the name of the player directly, it works, but not when I want to take it from args
player_state = hass.states.get("media_player.ha_display2_browser").state

@AlexxIT
Copy link
Owner

AlexxIT commented Mar 6, 2024

I'm not sure what args you talking about. I have never used browser mod. Don't understand your code.

@relust
Copy link
Author

relust commented Mar 6, 2024

I just need to import the name of the player that is selected in the gui that the responses are playing on to set the popup to close when the response is done playing.
I need to replace the name of the player that I put directly in the code and it works with, player_state = hass.states.get("media_player.ha_display2_browser").state, with the name of the player set in the graphic interface so that the player selector can work player_state = hass.states.get(player_entity_id).state
I don't know why it doesn't import the name of the player or maybe it doesn't import it in a format that works in this template. player_entity_id is imported from function arguments (hass, player_entity_id, media_id, "picture")

@AlexxIT
Copy link
Owner

AlexxIT commented Mar 6, 2024

I don't understand from what place your trying to get player_entity_id var.

@janstadt
Copy link

janstadt commented Sep 9, 2024

Did this ever get taken care of? I noticed teh VAD is way too aggressive as well and depending on how quickly the mp3 you play during start media vad is already over and the conversation agent cancels the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants