-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flashlight and Pyctcdecode decoders #8428
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
python setup.py bdist_wheel | ||
pip install dist/*.whl | ||
cd .. | ||
export USE_KENLM=1 && pip install flashlight-text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you install the pypi package for Flashlight decoders, the maximum supported ngram length is 6. If you try to use a longer Ngram you'll see an error even if you built kenlm with support for longer Ngrams.
This model has order 10 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. With cmake:
cmake -DKENLM_MAX_ORDER=10 ..
With Moses:
bjam --max-kenlm-order=10 -a
Otherwise, edit lm/max_order.hh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I fixed for other decoders except Flashlight. Let's fix it next time
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Note: |
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Preserve Flashlight and Pyctcdecode beamsearch with Ngram LM
Support Flashlight and Pyctcdecode decoding with pure KenLM and NeMo KenLM
Standardize API of CLI inference scripts
Collection: ASR
Changelog
-- Get logprobs from Hypothesis
-- Use "pyctcdecode" strategy as default beamsearch algorithm denoted as "beam"
-- Remove default seq2seq strategy
-- Remove kenlm_path and add word_kenlm_path (using lmplz), nemo_kenlm_path (train_kenlm.py)
-- Check decoding_type and search_type combinations
-- Support empty string in nemo_kenlm_path and word_kenlm_path for beamsearch without LM (ZeroLM)
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Additional Information