-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added script to compute phoneme labels and timestamps #528
base: master
Are you sure you want to change the base?
Conversation
Thank you, I'll try to merge coming week. |
Thanks @nshmyrev! As an update, in our latest commit we have implemented a separate function to compute word and phone results if you want to generate phone level results aligned with word results (with confidences). The result option can now be configured using |
Dear @nshmyrev , this functionality by @rutujaubale is fantastic! I believe that many people would hugely appreciate it if you merged it any time soon just as you have previously planned. Thank you very much for your efforts. |
… to disable MBR to generate phone outputs
…to output phone results
…_options for better consistency
9374f11
to
9f438fe
Compare
I found this pull request, when I was looking into the same feature. I hope I can be merged soon, while it sill has no conflicts with the base branch. |
That would be greatly appreciated! |
Hi @rutujaubale , I'm trying to rebuild vosk with your modification. But I got an error to rebuild it. It seems that the function you are using, Could you please let me know which version of kaldi are you using? Or how to build vosk with your modification?
|
@zhenxili96 I had to include #include "lat/lattice-functions-transition-model.h" |
Thanks @mmende, it really helps. |
Hello, I also found this pull request while looking for the same feature. I figured I would leave a comment (given it has been a while, and no official comment appears to have been made) since it would be extremely useful to have this sort of functionality! |
Hey @nshmyrev, I'd love to see this merged too! Is there anything I could help with to get it through? I've fixed the merge conflicts and tested on my branch here: https://github.com/Nathravorn/vosk-api I also fixed a few issues with this PR's code (most notably, the |
Is there anyway to get both phonemes and words at the same time for Spanish? I checked the two available Spanish models, neither of them have a phones.txt Thanks and appreciate your help! |
I would also love to see this merged. I've written automatic lip synch animation software based on Vosk using word timings. The algorithm makes guesses about the timings of phonemes. It works really great, which is a testament to Vosk. But the lip synching would be much better if Vosk could return the phoneme timings. |
Hey, i am looking for this exact thing! Is it possible that this is open source? I am trying to add lip syncing to TTS by listening to the audio stream and parsing out the phonemes. There is a project, https://github.com/DanielSWolf/rhubarb-lip-sync https://github.com/DanielSWolf/rhubarb-lip-sync that parses the audio into visemes. I would love to be able to do that live. If Vosk were able to merge this feature i would be able to get it working with an engine im already using. |
Maybe just go ahead and ope your own PR if you have a merge-able version of this code? I would love to see this feature released and to use it! |
@kevin, Rhubarb looks cool. I will check it out more later.
…On Wed, May 31, 2023, 5:38 PM Kevin Harrington ***@***.***> wrote:
I would also love to see this merged. I've written automatic lip synch
animation software based on Vosk using word timings. The algorithm makes
guesses about the timings of phonemes. It works really great, which is a
testament to Vosk. But the lip synching would be much better if Vosk could
return the phoneme timings.
Hey, i am looking for this exact thing! Is it possible that this is open
source? I am trying to add lip syncing to TTS by listening to the audio
stream and parsing out the phonemes. There is a project,
https://github.com/DanielSWolf/rhubarb-lip-sync
https://github.com/DanielSWolf/rhubarb-lip-sync that parses the audio
into visemes. I would love to be able to do that live. If Vosk were able to
merge this feature i would be able to get it working with an engine im
already using.
—
Reply to this email directly, view it on GitHub
<#528 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACB3GHRV7TYX7KYLCIDXEATXI7QBPANCNFSM44DRJWNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
This is to add an ability to generate phone labels and timestamps in the Vosk recognizer output
Output looks like