Bugfixes for decoding with Flashlight decoder #8856

michiahe · 2024-04-08T23:57:56Z

What does this PR do ?

Bugfixes for CTC beam search decoding with Flashlight.

Fix the maximum decoding length for CTC beam search decoding with Flashlight. The info about the maximum sequence length has to passed to Flashlight decoder otherwise junk/masked frames are decoded (results in insertion errors etc.). This info is passed properly to OpenSeq2Seq decoders and Pyctcdecoders by restricting the sequence length.
Fix the dictionary creation for lexicon-free decoding. The Dictionary has to contain a mapping from the ASR model's output IDs to the token in the vocabulary, i.e., exactly the same as the vocabulary in the tokeniser. The order of this mapping matters.

Collection: ASR

Changelog

Fix the maximum decoding length for CTC beam search decoding with Flashlight
Fix the dictionary creation for lexicon-free decoding

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

titu1994

Thank you very much for the fix !

titu1994 · 2024-04-11T06:59:29Z

The PR is ready to merge, however we require all contributors to sign their commits. Could you follow the instructions in the dco bot to resolve this - https://github.com/NVIDIA/NeMo/pull/8856/checks?check_run_id=23589977417

michiahe · 2024-04-11T07:09:55Z

Ok, will do so. But there is one more commit that I want to make which fixes the lexicon-free decoding. Or should I make a new PR for it? It's also only a few lines of code that will change and I think it's better to have it in a single PR.

Signed-off-by: Michael Hentschel <hentschel.michael@worksmobile.com>

for more information, see https://pre-commit.ci

Signed-off-by: Michael Hentschel <hentschel.michael@line-works.com>

michiahe · 2024-04-11T07:51:11Z

@titu1994 I hope everything is fine now.

michiahe · 2024-04-11T07:58:13Z

nemo/collections/asr/modules/flashlight_decoder.py

-            }
-            self.word_dict = create_word_dict(d)
+            # Dictionary contains a mapping of our ASR model's output IDs to the tokens in the vocabulary
+            self.word_dict = Dictionary(self.tokenizer_wrapper.vocab)


The dictionary has to contain the same number of IDs as the size of the ASR's output layer. I think we can't just add an <unk> token if it isn't already in the tokeniser's vocab.

I would need a review for this one, but I'm heading to a conference. @nithinraok can you ping the author of the file for review

Actually, if you remove the <unk> it causes a lot of problems with silences. I remember I added this hack because we had a lot of problems with lexicon-free decoding in that we were not generating spaces correctly, but adding this in fixed it. Although this is from 2 years ago, so my brain is a bit hazy with specifics, but I would strongly urge that you test out your new code without the <unk> in lexicon-free mode on English and Japanese to ensure you're not seeing any strange behaviour (it is also possible that whatever issue I was seeing 2 years ago when I added this hack has since been fixed in the C++ part of the flashlight code).

@trias702
Sorry for the late reply. I was at ICASSP and last week was Golden Week in Japan (default holiday season).

First things first, I haven't tested the implementation with a model/tokeniser that does not have an OOV token. All tokenisers that I'm using have an OOV token. I will have a look at the CTC probs for the OOV token, if it actually is output anywhere.

Regarding the Flashlight code, the corresponding decoding part is here
https://github.com/flashlight/text/blob/bbe9e3c201f5c9c3f3c0d553f0ea73af5e0a5209/flashlight/lib/text/decoder/LexiconFreeDecoder.cpp#L30-L125
The decoder iterates over the top token_size probabilities. If the AM doesn't have an OOV token, it should show up in the emission probabilities. Equally, if the LM doesn't have an OOV token, we should not be able to get a score for the OOV token, i.e., these lines should fail.
https://github.com/flashlight/text/blob/bbe9e3c201f5c9c3f3c0d553f0ea73af5e0a5209/flashlight/lib/text/decoder/LexiconFreeDecoder.cpp#L72-L73
because we get this error
https://github.com/flashlight/text/blob/bbe9e3c201f5c9c3f3c0d553f0ea73af5e0a5209/flashlight/lib/text/decoder/lm/KenLM.cpp#L66-L69

(I assume that an OOV token should be added after the tokeniser's vocab and have an index > usrToLmIdxMap_.size())

Signed-off-by: Michael Hentschel <hentschel.michael@line-works.com>

trias702

LGTM, however, I am not sure about wrapping self.tokenizer_wrapper.vocab with the flashlight Dictionary class and not adding the <unk> token if it's not there. Have you tested lexicon-free decoding with this change for both English and Japanese and confirm it works correctly for both? That is my only hesitation which I would ask is tested and verified with the Riva team, but everything else LGTM.

github-actions · 2024-05-14T01:45:30Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

michiahe · 2024-05-20T04:19:49Z

@titu1994 @trias702

I had some time to test the fix with an English model. The model is the same as the baseline from here (CTC Conformer) and the LM is also the same (sentencepiece 6gram).

The tokeniser has an <unk> token, but the results clearly improve.

Without bug fix

beam width: 10, alpha: 0.7, beta: 0.0

Task	WER(%)	Ins(%)	Del(%)	Sub(%)
Librispeech_dev_clean	8.86	4.32	0.57	3.97
Librispeech_dev_other	14.98	4.78	1.94	8.26
Librispeech_test_clean	9.15	4.39	0.66	4.10
Librispeech_test_other	15.06	4.96	1.88	8.23

With bug fix

beam width: 10, alpha: 0.7, beta: 0.0

Task	WER(%)	Ins(%)	Del(%)	Sub(%)
Librispeech_dev_clean	2.54	0.22	0.24	2.08
Librispeech_dev_other	6.64	0.52	0.87	5.25
Librispeech_test_clean	2.90	0.29	0.30	2.30
Librispeech_test_other	6.64	0.53	0.86	5.25

For reference

OS2S beam search decoder results from above paper:
- dev clean: 2.53
- test clean: 2.79
- dev other: 6.50
- test other: 6.56
Note: these results used slightly different settings for alpha and beta

trias702 · 2024-05-20T04:23:15Z

@titu1994 @trias702

I had some time to test the fix with an English model. The model is the same as the baseline from here (CTC Conformer) and the LM is also the same (sentencepiece 6gram).

The tokeniser has an <unk> token, but the results clearly improve.

Without bug fix
* beam width: 10, alpha: 0.7, beta: 0.0
Task WER(%) Ins(%) Del(%) Sub(%)
Librispeech_dev_clean 8.86 4.32 0.57 3.97
Librispeech_dev_other 14.98 4.78 1.94 8.26
Librispeech_test_clean 9.15 4.39 0.66 4.10
Librispeech_test_other 15.06 4.96 1.88 8.23

With bug fix
* beam width: 10, alpha: 0.7, beta: 0.0
Task WER(%) Ins(%) Del(%) Sub(%)
Librispeech_dev_clean 2.54 0.22 0.24 2.08
Librispeech_dev_other 6.64 0.52 0.87 5.25
Librispeech_test_clean 2.90 0.29 0.30 2.30
Librispeech_test_other 6.64 0.53 0.86 5.25

For reference
* OS2S beam search decoder results from above paper:
  
  * dev clean: 2.53
  * test clean: 2.79
  * dev other: 6.50
  * test other: 6.56

* Note: these results used slightly different settings for alpha and beta

Sounds good, I appreciate you running these, LGTM, I'll approve

trias702

LGTM

Signed-off-by: hentschel.michael@line-works.com

Signed-off-by: michiahe <michiahe@users.noreply.github.com>

github-actions · 2024-08-09T01:52:31Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2024-08-18T01:54:34Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

github-actions bot added the ASR label Apr 8, 2024

titu1994 previously approved these changes Apr 11, 2024

View reviewed changes

Fix decoding length for Flashlight decoder

a1dace8

Signed-off-by: Michael Hentschel <hentschel.michael@worksmobile.com>

michiahe dismissed titu1994’s stale review via a1dace8 April 11, 2024 07:32

michiahe force-pushed the fix_flashlight_decoding_length branch from 60348e7 to a1dace8 Compare April 11, 2024 07:32

pre-commit-ci bot and others added 2 commits April 11, 2024 07:33

[pre-commit.ci] auto fixes from pre-commit.com hooks

f315f55

for more information, see https://pre-commit.ci

Fixes dictionary creation for lexicon-free decoding with Flashlight

c6a210a

Signed-off-by: Michael Hentschel <hentschel.michael@line-works.com>

michiahe changed the title ~~Fix decoding length for Flashlight decoder~~ Bugfixes for decoding with Flashlight decoder Apr 11, 2024

Merge branch 'main' into fix_flashlight_decoding_length

82297e2

Signed-off-by: Michael Hentschel <hentschel.michael@line-works.com>

michiahe commented Apr 11, 2024

View reviewed changes

nithinraok requested a review from trias702 April 12, 2024 15:34

bugfix

46589af

Signed-off-by: Michael Hentschel <hentschel.michael@line-works.com>

michiahe force-pushed the fix_flashlight_decoding_length branch from 665d9c2 to 46589af Compare April 23, 2024 00:29

trias702 reviewed Apr 29, 2024

View reviewed changes

github-actions bot added the stale label May 14, 2024

trias702 previously approved these changes May 20, 2024

View reviewed changes

artbataev dismissed trias702’s stale review via 7f5b5d2 May 23, 2024 04:17

Merge branch 'main' into fix_flashlight_decoding_length

1814646

Signed-off-by: hentschel.michael@line-works.com

michiahe force-pushed the fix_flashlight_decoding_length branch from 7f5b5d2 to 1814646 Compare May 23, 2024 04:30

Apply isort and black reformatting

794058e

Signed-off-by: michiahe <michiahe@users.noreply.github.com>

github-actions bot removed the stale label Jul 26, 2024

github-actions bot added the stale label Aug 9, 2024

github-actions bot closed this Aug 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfixes for decoding with Flashlight decoder #8856

Bugfixes for decoding with Flashlight decoder #8856

michiahe commented Apr 8, 2024 •

edited

Loading

titu1994 left a comment

titu1994 commented Apr 11, 2024

michiahe commented Apr 11, 2024

michiahe commented Apr 11, 2024 •

edited

Loading

michiahe Apr 11, 2024

titu1994 Apr 12, 2024

trias702 Apr 29, 2024 •

edited

Loading

michiahe May 16, 2024

trias702 left a comment

github-actions bot commented May 14, 2024

michiahe commented May 20, 2024

trias702 commented May 20, 2024

Without bug fix

With bug fix

For reference

trias702 left a comment

github-actions bot commented Aug 9, 2024

github-actions bot commented Aug 18, 2024

Bugfixes for decoding with Flashlight decoder #8856

Bugfixes for decoding with Flashlight decoder #8856

Conversation

michiahe commented Apr 8, 2024 • edited Loading

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 commented Apr 11, 2024

michiahe commented Apr 11, 2024

michiahe commented Apr 11, 2024 • edited Loading

michiahe Apr 11, 2024

Choose a reason for hiding this comment

titu1994 Apr 12, 2024

Choose a reason for hiding this comment

trias702 Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

michiahe May 16, 2024

Choose a reason for hiding this comment

trias702 left a comment

Choose a reason for hiding this comment

github-actions bot commented May 14, 2024

michiahe commented May 20, 2024

Without bug fix

With bug fix

For reference

trias702 commented May 20, 2024

Without bug fix

With bug fix

For reference

trias702 left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 9, 2024

github-actions bot commented Aug 18, 2024

michiahe commented Apr 8, 2024 •

edited

Loading

michiahe commented Apr 11, 2024 •

edited

Loading

trias702 Apr 29, 2024 •

edited

Loading