-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New force-alignment API and two-pass alignment to get phone/state durations #300
Conversation
Excited to check this out! I'm at Interspeech and out of phase by half day and all, but I'll get a look shortly |
No problem! The CLI for state alignment isn't quite there yet, but coming soon (tonight, I hope). |
Fantastic! I also hope to try this out ASAP. I wonder whether constraining
to the first pass's word boundaries will help. It seems like it can't hurt,
but it would be interesting to measure how much.
…On Wed, Sep 21, 2022 at 3:42 PM David Huggins-Daines < ***@***.***> wrote:
No problem! The CLI for state alignment isn't quite there yet, but coming
soon (tonight, I hope).
—
Reply to this email directly, view it on GitHub
<#300 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZ4RVFMZXPP37UTRA5BSBTV7OFOXANCNFSM6AAAAAAQSKE6YM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
It will definitely make the alignment faster. It may make it more accurate though I am not certain of this - I have to look at how I implemented this back in 2006: https://www.cs.cmu.edu/~dhuggins/Publications/phlab.pdf EDIT: that paper was about forward-backward and not alignment, so not the same thing at all - in that case I implemented something like semi-Viterbi training, setting "impossible" phone sequences to zero probability, which resulted in models that were better for alignment (but somewhat worse for recognition) |
Note that we *wont* do state alignment here for the moment as it is dubiously useful unless you are doing unsupervised MLLR, which should get a specific implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoping for state level alignments, and frame level scores also, but LGTM and WFM
State level alignments are already there in the Python API, look at cython/test/alignment_test.py for an example, but it is now easy to add them to the command-line front-end as well, so I'll do that (not on by default though) |
The bestpath search is not suitable for force-alignment, as it removes internal silences. It also sometimes produces bogus segments which are incompatible with state alignment and cause it to crash. For the moment you should never use bestpath search for alignment.
THEY WERE TOTALLY BOGUS OMG! The word IDs were not converted! Who did that?!?!?
Now you can (relatively) easily do a second pass of alignment to get phone durations after decoding or word alignment.
Also, word alignment now uses FSG search, like SoundSwallower, so it's really fast and also handles silence and alternate pronunciations for you.