-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Context-biasing by CTC-based Word Spotter (CTC-WS) #8223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
for more information, see https://pre-commit.ci
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Fixed
Show fixed
Hide fixed
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Fixed
Show fixed
Hide fixed
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Fixed
Show fixed
Hide fixed
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Fixed
Show fixed
Hide fixed
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
for more information, see https://pre-commit.ci
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Fixed
Show fixed
Hide fixed
|
jenkins |
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
artbataev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work and tests!
Minor comment mostly related to the style
nemo/collections/asr/parts/context_biasing/context_biasing_utils.py
Outdated
Show resolved
Hide resolved
nemo/collections/asr/parts/context_biasing/context_biasing_utils.py
Outdated
Show resolved
Hide resolved
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Fixed
Show fixed
Hide fixed
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Outdated
Show resolved
Hide resolved
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Outdated
Show resolved
Hide resolved
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Show resolved
Hide resolved
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Outdated
Show resolved
Hide resolved
scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py
Outdated
Show resolved
Hide resolved
| assert len(ws_results) == 0 | ||
|
|
||
| # with context biasing | ||
| ws_results = context_biasing.run_word_spotter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool tests to cover primary functionality! Thanks!
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
for more information, see https://pre-commit.ci
|
jenkins |
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
artbataev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
|
jenkins |
1 similar comment
|
jenkins |
|
jenkins |
* initial commit Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix blank_idx slow down Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add new non-blank pruning Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * descriptions fix Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * description fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add ctc only model Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * move scripts to nemo asr parts Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * remove scripts from scripts dir Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add first test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add some tests Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix circular import Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix preds_output_folder Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * set loop_lables=True Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add .json to output manifest name Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix rnnt wer degradation Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add round(score) for test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bow token Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add epsilon shift in alignment Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix transcribe modification Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix autocast Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> --------- Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
* initial commit Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix blank_idx slow down Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add new non-blank pruning Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * descriptions fix Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * description fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add ctc only model Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * move scripts to nemo asr parts Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * remove scripts from scripts dir Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add first test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add some tests Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix circular import Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix preds_output_folder Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * set loop_lables=True Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add .json to output manifest name Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix rnnt wer degradation Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add round(score) for test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bow token Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add epsilon shift in alignment Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix transcribe modification Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix autocast Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> --------- Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com>
* initial commit Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix blank_idx slow down Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add new non-blank pruning Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * descriptions fix Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * description fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add ctc only model Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * move scripts to nemo asr parts Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * remove scripts from scripts dir Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add first test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add some tests Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * some fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix circular import Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix preds_output_folder Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * set loop_lables=True Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add .json to output manifest name Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix rnnt wer degradation Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * add round(score) for test Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bow token Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review fixes Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add epsilon shift in alignment Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * fix transcribe modification Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix autocast Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> --------- Signed-off-by: andrusenkoau <andrusenkoau@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do ?
A new context-biasing method for CTC and Transducer (RNNT) models (only Hybrid Transducer-CTC in case of Transducer) by CTC-based Word Spotter (CTC-WS). The idea is to use CTC logprobs for fast decoding with context-biasing graph (tree). The context-biasing graph is built according to the context-biasing list of words/phrases and CTC transition topology. Obtained results (words with start/end timestamps) are merged with greedy CTC/Transducer word-level predictions.
CTC-WS allows context-biasing in greedy decoding mode which is much faster than approaches based on shallow fusion (beam search decoding with external lm or context-biasing graph).
A detailed description of the CTC-WS algorithm and the comparison results will be published later.
Collection: [ASR]
Changelog
Usage
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkinson the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information