Skip to content

Conversation

@andrusenkoau
Copy link
Collaborator

What does this PR do ?

A new context-biasing method for CTC and Transducer (RNNT) models (only Hybrid Transducer-CTC in case of Transducer) by CTC-based Word Spotter (CTC-WS). The idea is to use CTC logprobs for fast decoding with context-biasing graph (tree). The context-biasing graph is built according to the context-biasing list of words/phrases and CTC transition topology. Obtained results (words with start/end timestamps) are merged with greedy CTC/Transducer word-level predictions.

CTC-WS allows context-biasing in greedy decoding mode which is much faster than approaches based on shallow fusion (beam search decoding with external lm or context-biasing graph).

A detailed description of the CTC-WS algorithm and the comparison results will be published later.

Collection: [ASR]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

python scripts/asr_context_biasing/eval_greedy_decoding_with_context_biasing.py \
            nemo_model_file=<path to the .nemo file of the model> \
            input_manifest=<path to the evaluation JSON manifest file \
            preds_output_folder=<optional folder to store the predictions> \
            decoder_type=<type of model decoder [ctc or rnnt]> \
            acoustic_batch_size=<batch size to calculate log probabilities> \
            apply_context_biasing=<True or False to apply context biasing> \
            context_file=<path to the context biasing file with key words/phrases> \
            beam_threshold=[<list of the beam thresholds, separated with commas>] \
            context_score=[<list of the context scores, separated with commas>] \
            ctc_ali_token_weight=[<list of the ctc alignment token weights, separated with commas>] \

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
@github-actions github-actions bot added the ASR label Jan 23, 2024
@andrusenkoau andrusenkoau added the feature request/PR for a new feature label Jan 23, 2024
andrusenkoau and others added 3 commits January 23, 2024 06:46
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
@andrusenkoau
Copy link
Collaborator Author

jenkins

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Copy link
Collaborator

@artbataev artbataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work and tests!
Minor comment mostly related to the style

assert len(ws_results) == 0

# with context biasing
ws_results = context_biasing.run_word_spotter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool tests to cover primary functionality! Thanks!

@andrusenkoau
Copy link
Collaborator Author

jenkins

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Copy link
Collaborator

@artbataev artbataev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@andrusenkoau
Copy link
Collaborator Author

jenkins

1 similar comment
@andrusenkoau
Copy link
Collaborator Author

jenkins

@andrusenkoau
Copy link
Collaborator Author

jenkins

@andrusenkoau andrusenkoau merged commit 0bfac69 into NVIDIA-NeMo:main Feb 13, 2024
ssh-meister pushed a commit to ssh-meister/NeMo that referenced this pull request Feb 15, 2024
* initial commit

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix blank_idx slow down

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add new non-blank pruning

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* descriptions fix

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* description fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add ctc only model

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* move scripts to nemo asr parts

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* remove scripts from scripts dir

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add first test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add some tests

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix circular import

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix preds_output_folder

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* set loop_lables=True

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add .json to output manifest name

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix rnnt wer degradation

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add round(score) for test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bow token

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add epsilon shift in alignment

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix transcribe modification

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix autocast

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

---------

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Sasha Meister <ameister@nvidia.com>
pablo-garay pushed a commit that referenced this pull request Mar 19, 2024
* initial commit

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix blank_idx slow down

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add new non-blank pruning

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* descriptions fix

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* description fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add ctc only model

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* move scripts to nemo asr parts

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* remove scripts from scripts dir

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add first test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add some tests

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix circular import

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix preds_output_folder

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* set loop_lables=True

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add .json to output manifest name

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix rnnt wer degradation

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add round(score) for test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bow token

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add epsilon shift in alignment

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix transcribe modification

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix autocast

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

---------

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* initial commit

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix blank_idx slow down

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add new non-blank pruning

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* descriptions fix

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* description fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add ctc only model

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* move scripts to nemo asr parts

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* remove scripts from scripts dir

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add first test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add some tests

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* some fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix circular import

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix preds_output_folder

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* set loop_lables=True

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add .json to output manifest name

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix rnnt wer degradation

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* add round(score) for test

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bow token

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review fixes

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add epsilon shift in alignment

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* fix transcribe modification

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix autocast

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

---------

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASR feature request/PR for a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants