Adding cache-aware streaming Conformer with look-ahead support #3888

VahidooX · 2022-03-26T05:40:10Z

What does this PR do ?

Adding cache-aware streaming Conformer training and inference with look-ahead support. It is achieved by training a model with limited effective right context and then perform the streaming with activation caching support. Limiting the right context would reduce the accuracy in compare to the an offline model but it gives better accuracy and significantly higher throughput by dropping duplicates in the computations which happens in buffered-based streaming.Large right context decreases the WER while increasing the latency.

It supports the three following modes:
1-fully causal model with zero look-ahead with zero latency
2-regular look-ahead
3-chunk-aware look-ahead with small duplication in computations.

It supports both Conformer-CTC and Conformer-Transducer and they can get trained with regular scripts but the configs files in the following folder:
NeMo/examples/asr/conf/conformer/streaming/

A model trained in streaming mode can get evaluated with the following script:
NeMo/examples/asr/conf/conformer/streaming/speech_to_text_streaming_infer.py

This script would simulate the streaming inference for a single audio or a manifest of audio files. Streaming can be done in multi-streaming mode (batched inference) for the manifest file to speed up the streaming. It can also compare the results with offline evaluation and report the differences in both the WER and models' outputs.

The accuracy of the model in both the offline evaluation and streaming is going to be exactly the same. In offline mode, the whole audio is passed through the model while in streaming audio is passed chunk by chunk.

Changelog

Added frame-wise streaming Conformer models with look-ahead support and caching mechanism for streaming inference.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

PR Type:

[x ] New Feature
Bugfix
Documentation

Signed-off-by: Vahid <vnoroozi@nvidia.com>

…_conv

Signed-off-by: Vahid <vnoroozi@nvidia.com>

lgtm-com · 2022-08-02T07:07:23Z

This pull request introduces 7 alerts and fixes 4 when merging c2cfe4e into 8e1436b - view on LGTM.com

new alerts:

6 for Unused local variable
1 for Non-callable called

fixed alerts:

4 for Unused import

Signed-off-by: Vahid <vnoroozi@nvidia.com>

lgtm-com · 2022-08-02T18:37:20Z

This pull request introduces 7 alerts and fixes 4 when merging 7589f88 into aaeac3c - view on LGTM.com

new alerts:

6 for Unused local variable
1 for Non-callable called

fixed alerts:

4 for Unused import

…former_lookahead_newdesign

lgtm-com · 2022-08-02T22:42:13Z

This pull request introduces 7 alerts and fixes 4 when merging 0bde720 into 5c8fe3a - view on LGTM.com

new alerts:

6 for Unused local variable
1 for Non-callable called

fixed alerts:

4 for Unused import

Signed-off-by: Vahid <vnoroozi@nvidia.com>

lgtm-com · 2022-08-02T23:21:46Z

This pull request introduces 7 alerts and fixes 4 when merging 090f838 into 5c8fe3a - view on LGTM.com

new alerts:

6 for Unused local variable
1 for Non-callable called

fixed alerts:

4 for Unused import

Signed-off-by: Vahid <vnoroozi@nvidia.com>

lgtm-com · 2022-08-03T01:33:32Z

This pull request introduces 9 alerts and fixes 4 when merging 463aed6 into 5c8fe3a - view on LGTM.com

new alerts:

8 for Unused local variable
1 for Non-callable called

fixed alerts:

4 for Unused import

titu1994

Approving for now since we're out of times. But before merge, rename the function to cache_aware_stream_step - basic stream_step is too generic and does not inform what is being used, and is not future proof.

titu1994 · 2022-08-03T01:30:35Z

examples/asr/asr_streaming/speech_to_text_streaming_infer.py

+        start_time = time.time()
+        for sample_idx, sample in enumerate(samples):
+            processed_signal, processed_signal_length, stream_id = streaming_buffer.append_audio_file(
+                sample['audio_filepath'], stream_id=-1


We need to document this script a lot more in the branch cut for 1.11. For now its fine

titu1994 · 2022-08-03T01:30:47Z

examples/asr/asr_streaming/speech_to_text_streaming_infer.py

+            if (sample_idx + 1) % args.batch_size == 0 or sample_idx == len(samples) - 1:
+                logging.info(f"Starting to stream samples {sample_idx - len(streaming_buffer) + 1} to {sample_idx}...")
+                streaming_tran, offline_tran = perform_streaming(
+                    asr_model=asr_model,


titu1994 · 2022-08-03T01:31:45Z

nemo/collections/asr/models/asr_model.py

        if hasattr(self.input_module, 'forward_for_export'):
-            encoder_output = self.input_module.forward_for_export(input, length)
+            if cache_last_channel is None and cache_last_time is None:


Ok leaving this comment unresolved for later check then.

nemo/collections/asr/parts/mixins/streaming.py

nemo/collections/asr/parts/preprocessing/features.py

Signed-off-by: Vahid <vnoroozi@nvidia.com>

…former_lookahead_newdesign

lgtm-com · 2022-08-03T02:48:34Z

This pull request introduces 9 alerts and fixes 4 when merging 194581f into 498ff20 - view on LGTM.com

new alerts:

8 for Unused local variable
1 for Non-callable called

fixed alerts:

4 for Unused import

effendijohanes · 2022-08-03T09:57:44Z

Hi @VahidooX , thanks for the examples you made. I tried with 2 minutes wav file using stt_en_conformer_transducer_small.nemo model,

python examples/asr/asr_streaming/speech_to_text_streaming_infer.py --asr_model stt_en_conformer_transducer_small.nemo --audio_file test.wav

but I get this error during online mode:

Traceback (most recent call last):
  File ".../nemo/sandbox/../examples/asr/asr_streaming/speech_to_text_streaming_infer.py", line 333, in <module>
    main()
  File ".../nemo/sandbox/../examples/asr/asr_streaming/speech_to_text_streaming_infer.py", line 261, in main
    perform_streaming(
  File ".../nemo/sandbox/../examples/asr/asr_streaming/speech_to_text_streaming_infer.py", line 128, in perform_streaming
    ) = asr_model.conformer_stream_step(
  File ".../nemo/nemo/collections/asr/parts/mixins/mixins.py", line 441, in conformer_stream_step
    (encoded, encoded_len, cache_last_channel_next, cache_last_time_next) = self.encoder.cache_aware_stream_step(
  File ".../nemo/nemo/collections/asr/parts/mixins/streaming.py", line 61, in cache_aware_stream_step
    encoder_output = self(
  File ".../nemo/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../nemo/nemo/core/classes/common.py", line 1084, in __call__
    outputs = wrapped(*args, **kwargs)
  File ".../nemo/nemo/collections/asr/modules/conformer_encoder.py", line 382, in forward
    cache_last_channel_next = torch.zeros(
RuntimeError: Trying to create tensor with negative dimension -7204: [16, 1, -7204, 176]

do you have any idea what might have happened? Thanks!

VahidooX · 2022-08-03T16:22:09Z

Hi @VahidooX , thanks for the examples you made. I tried with 2 minutes wav file using stt_en_conformer_transducer_small.nemo model,

python examples/asr/asr_streaming/speech_to_text_streaming_infer.py --asr_model stt_en_conformer_transducer_small.nemo --audio_file test.wav

but I get this error during online mode:

Traceback (most recent call last):
  File ".../nemo/sandbox/../examples/asr/asr_streaming/speech_to_text_streaming_infer.py", line 333, in <module>
    main()
  File ".../nemo/sandbox/../examples/asr/asr_streaming/speech_to_text_streaming_infer.py", line 261, in main
    perform_streaming(
  File ".../nemo/sandbox/../examples/asr/asr_streaming/speech_to_text_streaming_infer.py", line 128, in perform_streaming
    ) = asr_model.conformer_stream_step(
  File ".../nemo/nemo/collections/asr/parts/mixins/mixins.py", line 441, in conformer_stream_step
    (encoded, encoded_len, cache_last_channel_next, cache_last_time_next) = self.encoder.cache_aware_stream_step(
  File ".../nemo/nemo/collections/asr/parts/mixins/streaming.py", line 61, in cache_aware_stream_step
    encoder_output = self(
  File ".../nemo/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File ".../nemo/nemo/core/classes/common.py", line 1084, in __call__
    outputs = wrapped(*args, **kwargs)
  File ".../nemo/nemo/collections/asr/modules/conformer_encoder.py", line 382, in forward
    cache_last_channel_next = torch.zeros(
RuntimeError: Trying to create tensor with negative dimension -7204: [16, 1, -7204, 176]

do you have any idea what might have happened? Thanks!

This approach need you to train a model in streaming mode to get the best results which means with limited right and left context and no normalization in feature extraction. While it can be possible to try offline models with this approach, the accuracy would not be great. I have not added the support of offline models in this PR, I would look into it and add it soon.

itzsimpl · 2022-08-03T20:29:23Z

@VahidooX are there perhaps any pre-trained streaming models already available?

VahidooX · 2022-08-03T21:01:15Z

@VahidooX are there perhaps any pre-trained streaming models already available?

Not yet, I am still working on training them on nemo asrset. Hopefully there will be some uploaded on NGC by the end of this month.

effendijohanes · 2022-08-04T01:42:40Z

Hi @VahidooX , looking forward to the support of offline models, thank you very much!

VahidooX · 2022-08-05T19:26:29Z

Hi @VahidooX , looking forward to the support of offline models, thank you very much!

Here is the draft PR to add support for models trained with full context to be used with cache-aware streaming in chunk-aware look-ahead style:

#4687

Just note that the results would be significantly worse than when you train the model in streaming mode. I will share some numbers in the PR when they are ready. The main advantage of using this approach on an offline model comparing to the buffered streaming is just using less computations. Cache-aware approach is unlikely to give better results in terms of accuracy for such models as they don't use overlapping chunks in chunk-aware mode. I would try to add the support for regular look-ahead which uses overlapping chunks.

effendijohanes · 2022-08-08T01:31:23Z

Thanks for the PR @VahidooX , let me study your code.

…A#3888) Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

…A#3888) Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>

shahin-trunk · 2022-09-15T03:27:36Z

@VahidooX are there perhaps any pre-trained streaming models already available?

Not yet, I am still working on training them on nemo asrset. Hopefully there will be some uploaded on NGC by the end of this month.

@VahidooX any update on pre-trained models. Not able to get the models converge without initializing the weights.

Higher08 · 2022-10-12T09:10:16Z

@VahidooX Did you manage to train these models on Nemo ASRSET? If yes, can you send files?

…A#3888) Signed-off-by: Hainan Xu <hainanx@nvidia.com>

VahidooX added 30 commits November 15, 2021 18:34

added causal conv.

072d377

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added causal conv.

ad18c12

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added causal conv.

584b1bf

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added causal conv.

9911571

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added causal conv.

e3c68b1

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added causal conv.

ca03d82

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added causal conv.

6c3d968

Signed-off-by: Vahid <vnoroozi@nvidia.com>

Merge branch 'main' of https://github.com/NVIDIA/NeMo into add_casual…

ee51fd8

…_conv

added caching. made convolutions causal.

059d0c3

Signed-off-by: Vahid <vnoroozi@nvidia.com>

separated caches.

0d0cbb6

Signed-off-by: Vahid <vnoroozi@nvidia.com>

separated caches.

6c1aa9a

Signed-off-by: Vahid <vnoroozi@nvidia.com>

moved caching outside downsampling convs.

281ae4f

Signed-off-by: Vahid <vnoroozi@nvidia.com>

made paddings non-symmetric.

bc6139e

Signed-off-by: Vahid <vnoroozi@nvidia.com>

made paddings non-symmetric.

35dfb42

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added streaming script

f7788fa

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added streaming script

52be159

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added streaming script

f798133

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added clone.

30679c9

Signed-off-by: Vahid <vnoroozi@nvidia.com>

add streaming mode coversion.

55e656d

Signed-off-by: Vahid <vnoroozi@nvidia.com>

add streaming mode coversion.

11c9a84

Signed-off-by: Vahid <vnoroozi@nvidia.com>

add streaming mode coversion.

60c7d19

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added next_cache.

be191a0

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added next_cache.

89ffacb

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added next_cache.

31878aa

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added next_cache.

dd08bf2

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added next_cache.

0af0d7d

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added next_cache.

23191fa

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added next_cache.

26f9b10

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added next_cache.

3a54675

Signed-off-by: Vahid <vnoroozi@nvidia.com>

added max_cache_len to attention.

9690652

Signed-off-by: Vahid <vnoroozi@nvidia.com>

fixed bug in depthwise conv.

7589f88

Signed-off-by: Vahid <vnoroozi@nvidia.com>

Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…

0bde720

…former_lookahead_newdesign

fixed bug in depthwise conv.

090f838

Signed-off-by: Vahid <vnoroozi@nvidia.com>

VahidooX added 3 commits August 2, 2022 18:05

fixed bug in depthwise conv.

268a639

Signed-off-by: Vahid <vnoroozi@nvidia.com>

cleaned docs.

fdcb3a1

Signed-off-by: Vahid <vnoroozi@nvidia.com>

cleaned docs.

463aed6

Signed-off-by: Vahid <vnoroozi@nvidia.com>

titu1994 previously approved these changes Aug 3, 2022

View reviewed changes

cleaned docs.

b5d8306

Signed-off-by: Vahid <vnoroozi@nvidia.com>

VahidooX dismissed titu1994’s stale review via b5d8306 August 3, 2022 02:36

Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…

194581f

…former_lookahead_newdesign

titu1994 approved these changes Aug 3, 2022

View reviewed changes

VahidooX merged commit eae1684 into NVIDIA:main Aug 3, 2022

Davood-M pushed a commit to Davood-M/NeMo that referenced this pull request Aug 9, 2022

Adding cache-aware streaming Conformer with look-ahead support (NVIDI…

b46527a

…A#3888) Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

piraka9011 pushed a commit to piraka9011/NeMo that referenced this pull request Aug 25, 2022

Adding cache-aware streaming Conformer with look-ahead support (NVIDI…

c66415b

…A#3888) Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>

hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022

Adding cache-aware streaming Conformer with look-ahead support (NVIDI…

0892043

…A#3888) Signed-off-by: Hainan Xu <hainanx@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding cache-aware streaming Conformer with look-ahead support #3888

Adding cache-aware streaming Conformer with look-ahead support #3888

VahidooX commented Mar 26, 2022 •

edited

lgtm-com bot commented Aug 2, 2022

lgtm-com bot commented Aug 2, 2022

lgtm-com bot commented Aug 2, 2022

lgtm-com bot commented Aug 2, 2022

lgtm-com bot commented Aug 3, 2022

titu1994 left a comment

titu1994 Aug 3, 2022

titu1994 Aug 3, 2022

titu1994 Aug 3, 2022

lgtm-com bot commented Aug 3, 2022

effendijohanes commented Aug 3, 2022

VahidooX commented Aug 3, 2022

itzsimpl commented Aug 3, 2022

VahidooX commented Aug 3, 2022

effendijohanes commented Aug 4, 2022

VahidooX commented Aug 5, 2022 •

edited

effendijohanes commented Aug 8, 2022

shahin-trunk commented Sep 15, 2022

Higher08 commented Oct 12, 2022

Adding cache-aware streaming Conformer with look-ahead support #3888

Adding cache-aware streaming Conformer with look-ahead support #3888

Conversation

VahidooX commented Mar 26, 2022 • edited

What does this PR do ?

Changelog

Usage

lgtm-com bot commented Aug 2, 2022

lgtm-com bot commented Aug 2, 2022

lgtm-com bot commented Aug 2, 2022

lgtm-com bot commented Aug 2, 2022

lgtm-com bot commented Aug 3, 2022

titu1994 left a comment

Choose a reason for hiding this comment

titu1994 Aug 3, 2022

Choose a reason for hiding this comment

titu1994 Aug 3, 2022

Choose a reason for hiding this comment

titu1994 Aug 3, 2022

Choose a reason for hiding this comment

lgtm-com bot commented Aug 3, 2022

effendijohanes commented Aug 3, 2022

VahidooX commented Aug 3, 2022

itzsimpl commented Aug 3, 2022

VahidooX commented Aug 3, 2022

effendijohanes commented Aug 4, 2022

VahidooX commented Aug 5, 2022 • edited

effendijohanes commented Aug 8, 2022

shahin-trunk commented Sep 15, 2022

Higher08 commented Oct 12, 2022

VahidooX commented Mar 26, 2022 •

edited

VahidooX commented Aug 5, 2022 •

edited