Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wake up word trigger #10

Open
thomasjoubert opened this issue Apr 9, 2024 · 6 comments
Open

Wake up word trigger #10

thomasjoubert opened this issue Apr 9, 2024 · 6 comments

Comments

@thomasjoubert
Copy link

Hi!
One of my goals is to be able to trigger an animation based on a keyword or smiley included in the LLM answer.
Therefore, I could animate expressive faces on OBS for instance via websockets, by the script being triggered by the smiley (It's ignored be elevenlabs).
Not sure if it's the right project - it might be better for linguflex as a module, but is that possible?

@KoljaB
Copy link
Owner

KoljaB commented Apr 9, 2024

Should by quite easy to do for both projects. Easy way filtering the incoming tokens for emoticons then triggering the animation. Probably a better, more reliable way would be to use a structured output library like instructor and force the llm to fill out a pydantic field with the desired expression.

@thomasjoubert
Copy link
Author

ok, it might be above my skills :) I couldn't find a pseudo-code logic for being sure that the expressive face scene would be triggered while contextual speech was played. Just to be sure I explained myself correctly (by re-reading my ask, it's not sure) :
What I wish to do is to ask the AI to provide an answer including emotion cues in it so that while speaking we would see the face change.
The difficulty I see is that the cues must trigger the face change when the audio is read.
I first thought that a way would be to divide the answer by chunks, rename them according to the emotion, and when the file is played, the code would take the filename and send a trigger to OBS, but maybe there is an easier way.

@KoljaB
Copy link
Owner

KoljaB commented Apr 9, 2024

What I wish to do is to ask the AI to provide an answer including emotion cues in it so that while speaking we would see the face change.
This is what structured output libraries are meant for. Instead of trying to filter the emotion cues out of one single big LLM response a library like instructor can split up the LLM answer in multiple parts. It could send a sentence and then an emotion cue together with that sentence, so that for every sentence of the LLM you would get the expression presented. Also you could restrict the LLM to only respond with certain emotion cues, that you could define before. I think this would be the gold standard of realizing your idea.

The difficulty I see is that the cues must trigger the face change when the audio is read.
This would the way, which requires analyzing the "standard output" of the LLM. I don't see this being very reliable. It's the classic "we beg the LLM to include certain stuff in the output without being sure that it does" way, which also involves parsing the output. Doable, but not the state-of-the-art way to achieve what you want to do.

@thomasjoubert
Copy link
Author

Thank you for your taking the time to answer. So as I understand, the first would be the gold standard but I'm afraid of this adding latency as it would be one instruction at the time? Or would it be that it's one big answer containing multiple sentences along with their emotions?
Anyways, it's a bit aiming to high yet for my skills but it'd be cool to have this expressive module one day, that triggers whatever the user wants (could be a led color, eyes expression, face changing...)

@KoljaB
Copy link
Owner

KoljaB commented Apr 9, 2024

With instructor you could make the LLM send a list of pairs of sentences and emotions and stream everything back token by token so you would have only minimal latency added. I've been thinking about an upgrade to my LocalAIVoiceChat project, where I plan to do this with different voice references for every emotion.

@KoljaB
Copy link
Owner

KoljaB commented Apr 9, 2024

Look here (watch the little clip)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants