Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cache-aware streaming Conformer with look-ahead support #3888

Merged
merged 247 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from 238 commits
Commits
Show all changes
247 commits
Select commit Hold shift + click to select a range
072d377
added causal conv.
VahidooX Nov 16, 2021
ad18c12
added causal conv.
VahidooX Nov 16, 2021
584b1bf
added causal conv.
VahidooX Nov 16, 2021
9911571
added causal conv.
VahidooX Nov 16, 2021
e3c68b1
added causal conv.
VahidooX Nov 16, 2021
ca03d82
added causal conv.
VahidooX Nov 16, 2021
6c3d968
added causal conv.
VahidooX Nov 16, 2021
ee51fd8
Merge branch 'main' of https://github.com/NVIDIA/NeMo into add_casual…
VahidooX Nov 20, 2021
059d0c3
added caching. made convolutions causal.
VahidooX Nov 20, 2021
0d0cbb6
separated caches.
VahidooX Nov 20, 2021
6c1aa9a
separated caches.
VahidooX Nov 21, 2021
281ae4f
moved caching outside downsampling convs.
VahidooX Nov 22, 2021
bc6139e
made paddings non-symmetric.
VahidooX Nov 22, 2021
35dfb42
made paddings non-symmetric.
VahidooX Nov 22, 2021
f7788fa
added streaming script
VahidooX Nov 22, 2021
52be159
added streaming script
VahidooX Nov 22, 2021
f798133
added streaming script
VahidooX Nov 22, 2021
30679c9
added clone.
VahidooX Nov 22, 2021
55e656d
add streaming mode coversion.
VahidooX Nov 22, 2021
11c9a84
add streaming mode coversion.
VahidooX Nov 22, 2021
60c7d19
add streaming mode coversion.
VahidooX Nov 22, 2021
be191a0
added next_cache.
VahidooX Dec 2, 2021
89ffacb
added next_cache.
VahidooX Dec 2, 2021
31878aa
added next_cache.
VahidooX Dec 3, 2021
dd08bf2
added next_cache.
VahidooX Dec 3, 2021
0af0d7d
added next_cache.
VahidooX Dec 3, 2021
23191fa
added next_cache.
VahidooX Dec 3, 2021
26f9b10
added next_cache.
VahidooX Dec 3, 2021
3a54675
added next_cache.
VahidooX Dec 3, 2021
9690652
added max_cache_len to attention.
VahidooX Dec 3, 2021
a214f06
added max_cache_len to attention.
VahidooX Dec 3, 2021
b05fcbc
added max_cache_len to attention.
VahidooX Dec 4, 2021
a378cab
fixed the bug.
VahidooX Dec 15, 2021
adcae5e
Merge branch 'main' of https://github.com/NVIDIA/NeMo into add_casual…
VahidooX Jan 20, 2022
6062ff5
fixed bugs.
VahidooX Feb 3, 2022
afb0c3c
added att_context_style param.
VahidooX Feb 4, 2022
a74c6cc
added att_context_style param.
VahidooX Feb 4, 2022
8792142
added att_context_style param.
VahidooX Feb 4, 2022
47e5743
added stacking downsampling.
VahidooX Feb 4, 2022
34003ee
added stacking downsampling.
VahidooX Feb 4, 2022
59e2f0b
added conv_context_size
VahidooX Feb 4, 2022
bd2672c
added conv_context_size
VahidooX Feb 4, 2022
e0e9d19
added conv_context_size
VahidooX Feb 4, 2022
7c55a96
added conv_context_size
VahidooX Feb 4, 2022
238bab4
added conv_context_size
VahidooX Feb 4, 2022
237571b
added conv_context_size
VahidooX Feb 4, 2022
9e2080b
bug fixed for -1 context.
VahidooX Feb 8, 2022
88c2e48
bug fixed for -1 context.
VahidooX Feb 8, 2022
7e41d3c
Added look ahead support.
VahidooX Feb 12, 2022
77da09f
fixed transducer.
VahidooX Feb 12, 2022
812592f
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX Feb 12, 2022
b926606
dropped pre_encode cache.
VahidooX Feb 12, 2022
6a6365f
trying to fix rnnt decoder.
VahidooX Feb 13, 2022
d3eaa68
reverted fixed the bug for bidirectional.
VahidooX Feb 13, 2022
8e642fc
reverted fixed the bug for bidirectional.
VahidooX Feb 14, 2022
0de8496
fixed the decoder bug.
VahidooX Feb 14, 2022
f127049
CLEANUP.
VahidooX Feb 14, 2022
d0429db
CLEANUP.
VahidooX Feb 14, 2022
a56f9c2
CLEANUP.
VahidooX Feb 14, 2022
021bbec
fixed.
VahidooX Feb 14, 2022
08ea9e9
fixed.
VahidooX Feb 14, 2022
5bfd301
fixed the bug.
VahidooX Feb 15, 2022
3578015
Trying to fix onnx.
VahidooX Feb 15, 2022
15b5729
FIXED.
VahidooX Feb 15, 2022
c6fbafc
added support for onnx.
VahidooX Feb 17, 2022
28aed44
added support for onnx.
VahidooX Feb 17, 2022
5399b66
enabled ctc output print.
VahidooX Feb 17, 2022
6d492e3
moved the triu.
VahidooX Feb 18, 2022
69be838
moved the triu.
VahidooX Feb 18, 2022
be62bb6
trying to add onnx runtime support.
VahidooX Feb 23, 2022
c18dab8
Update to new design.
VahidooX Mar 1, 2022
aa8f501
Update to new design.
VahidooX Mar 1, 2022
8ee8b6d
Update to new design.
VahidooX Mar 2, 2022
94290cd
Update to new design.
VahidooX Mar 2, 2022
db6d256
added FramewiseStreamingAudioBuffer
VahidooX Mar 2, 2022
d6986aa
added FramewiseStreamingAudioBuffer
VahidooX Mar 4, 2022
9ae111e
added FramewiseStreamingAudioBuffer
VahidooX Mar 4, 2022
5bb3d46
added FramewiseStreamingAudioBuffer
VahidooX Mar 4, 2022
39466de
added FramewiseStreamingAudioBuffer
VahidooX Mar 4, 2022
6ade6f6
added FramewiseStreamingAudioBuffer
VahidooX Mar 4, 2022
a16649c
added FramewiseStreamingAudioBuffer
VahidooX Mar 4, 2022
6da65d8
added FramewiseStreamingAudioBuffer
VahidooX Mar 4, 2022
9215604
added step_stream
VahidooX Mar 4, 2022
5c58517
added step_stream
VahidooX Mar 4, 2022
a54447d
added step_stream
VahidooX Mar 4, 2022
a28b0f8
cleaned up.
VahidooX Mar 4, 2022
55e75c8
cleaned up.
VahidooX Mar 5, 2022
10b9258
added refiner_shuffle_rate
VahidooX Mar 6, 2022
e57dcbf
added batch support.
VahidooX Mar 12, 2022
c070041
added batch support.
VahidooX Mar 12, 2022
18fbb6f
added batch support.
VahidooX Mar 12, 2022
444c322
fixed the bugs for lengths.
VahidooX Mar 13, 2022
f8afb91
fixed the bugs for lengths.
VahidooX Mar 13, 2022
e40bcca
added rnnt support.
VahidooX Mar 13, 2022
a3d1518
added rnnt support.
VahidooX Mar 13, 2022
897672c
added rnnt support.
VahidooX Mar 13, 2022
6c1f9ae
added rnnt support.
VahidooX Mar 13, 2022
6e00c11
fixed the bug.
VahidooX Mar 13, 2022
df0f169
added verbose.
VahidooX Mar 13, 2022
ec2fa9e
added verbose.
VahidooX Mar 14, 2022
3d31a55
added verbose.
VahidooX Mar 14, 2022
4f41299
added verbose.
VahidooX Mar 14, 2022
a0bf383
fixed rnnt models.
VahidooX Mar 14, 2022
6fa9c58
fixed rnnt models.
VahidooX Mar 14, 2022
79d0865
fixed rnnt models.
VahidooX Mar 14, 2022
e3ec0aa
fixed rnnt models.
VahidooX Mar 14, 2022
fced645
fixed rnnt models.
VahidooX Mar 14, 2022
6e87ef3
fixed rnnt models.
VahidooX Mar 14, 2022
88a2062
cleaned code.
VahidooX Mar 14, 2022
6915d3b
fixed ctc code.
VahidooX Mar 14, 2022
edbd032
fixed ctc code.
VahidooX Mar 14, 2022
2724e3f
fixed ctc code.
VahidooX Mar 14, 2022
7be3466
cleaned the code
VahidooX Mar 14, 2022
d2f81ca
cleaned the code
VahidooX Mar 15, 2022
01ba464
cleaned the code
VahidooX Mar 15, 2022
b121043
cleaned the code
VahidooX Mar 15, 2022
5575769
added wer calc.
VahidooX Mar 15, 2022
17a2d54
added wer calc.
VahidooX Mar 15, 2022
b19b876
added wer calc.
VahidooX Mar 15, 2022
420c68e
added wer calc.
VahidooX Mar 15, 2022
fe5d2b6
added wer calc.
VahidooX Mar 15, 2022
7b93de3
added wer calc.
VahidooX Mar 15, 2022
c5d9525
added wer calc.
VahidooX Mar 15, 2022
ecda220
added wer calc.
VahidooX Mar 15, 2022
ec5bc10
added wer calc.
VahidooX Mar 15, 2022
4bd5f3d
added wer calc.
VahidooX Mar 15, 2022
5e376c0
FIXED class names.
VahidooX Mar 15, 2022
5065073
FIXED class names.
VahidooX Mar 16, 2022
755cbb7
FIXED class names.
VahidooX Mar 23, 2022
775e3cf
dropped init_vars.
VahidooX Mar 23, 2022
febfa52
added pre_pad.
VahidooX Mar 24, 2022
5565202
added pre_pad.
VahidooX Mar 24, 2022
424c674
added pre_pad.
VahidooX Mar 24, 2022
2e4e06a
fixed added pre_pad.
VahidooX Mar 24, 2022
56b5703
added skip_nan_grad.
VahidooX Mar 25, 2022
03783c2
merged main.
VahidooX Mar 26, 2022
9cfffe6
fixed online normalization
VahidooX Mar 30, 2022
dc61970
added timeing.
VahidooX Mar 31, 2022
a12fa24
fixed layer norm typ checking.
VahidooX Mar 31, 2022
3c00c83
adding onnx support.
VahidooX Apr 1, 2022
339ea0a
pull from main.
VahidooX Apr 3, 2022
cbef980
pull from main.
VahidooX Apr 3, 2022
de7b3ae
pull from main.
VahidooX Apr 3, 2022
b83caf0
fixing stacking.
VahidooX Apr 4, 2022
0495c52
added support for mean value.
VahidooX Apr 5, 2022
40a5783
added support for mean value.
VahidooX Apr 5, 2022
0a8ec53
added support for mean value.
VahidooX Apr 5, 2022
87c1807
added support for mean value.
VahidooX Apr 6, 2022
774410c
added support for mean value.
VahidooX Apr 6, 2022
b56178c
disabled nan_grad.
VahidooX Apr 8, 2022
995d08e
added onnx to rnnt.
VahidooX Apr 22, 2022
c91a1e7
fixed the bug in buffer streamer.
VahidooX Apr 23, 2022
9e0d4f9
fixed the bug in buffer streamer.
VahidooX Apr 24, 2022
d67ba58
fixed the bug in buffer streamer.
VahidooX Apr 26, 2022
7b0902f
fixed the bug in buffer streamer.
VahidooX Apr 27, 2022
811c774
fixed the bug in buffer streamer.
VahidooX Apr 27, 2022
1b90e2f
fixed the bug in buffer streamer.
VahidooX Apr 27, 2022
ff6ece2
fixed the bug in stacking.
VahidooX Apr 28, 2022
7c1a46c
fixed the bug in stacking.
VahidooX Apr 28, 2022
946c928
fixed the conv chachinf bug.
VahidooX Apr 29, 2022
2367120
fixed the conv chachinf bug.
VahidooX Apr 29, 2022
0f714da
add do_caching.
VahidooX Apr 29, 2022
f1b47ea
add calc_drop_extra_pre_encoded.
VahidooX Apr 29, 2022
e436ee5
add calc_drop_extra_pre_encoded.
VahidooX Apr 30, 2022
303aea4
added group norm.
VahidooX May 1, 2022
87f001a
added group norm.
VahidooX May 2, 2022
2e02b1b
added group norm.
VahidooX May 5, 2022
73557b3
added group norm.
VahidooX May 5, 2022
be675ec
pulled from main.
VahidooX May 5, 2022
6e91f96
added prenorm
VahidooX May 9, 2022
c1af1aa
added prenorm
VahidooX May 9, 2022
75c1d7c
moved cache_last_channel_next to stream step.
VahidooX May 11, 2022
860995f
moved prenorm
VahidooX May 11, 2022
c1bc3a5
moved prenorm, fixed the style
VahidooX May 12, 2022
99ea203
moved prenorm, fixed the style
VahidooX May 12, 2022
3883967
moved prenorm, fixed the style
VahidooX May 15, 2022
5ad6bde
added mlp for prenorm.
VahidooX May 15, 2022
93585bd
added mlp for prenorm.
VahidooX May 16, 2022
dc1fb84
added mlp for prenorm.
VahidooX May 17, 2022
4c56214
added onnx script.
VahidooX May 17, 2022
879517d
added onnx script.
VahidooX May 17, 2022
14180a8
added onnx script.
VahidooX May 17, 2022
6390a3e
added onnx script.
VahidooX May 17, 2022
73a3e7c
makde drop_extra_pre_encode to integer.
VahidooX May 17, 2022
a87554c
cleaned the code.
VahidooX May 17, 2022
e9251eb
cleaned the code.
VahidooX May 17, 2022
e358708
cleaned the code.
VahidooX May 17, 2022
391a259
cleaned the code.
VahidooX May 17, 2022
f656fa2
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX May 17, 2022
7aa519b
cleaned the code.
VahidooX May 17, 2022
dbc409d
cleaned the code.
VahidooX May 18, 2022
2bc6fd8
fixed the onnx conversion without caching.
VahidooX May 18, 2022
1065c66
fixed the onnx conversion without caching.
VahidooX May 18, 2022
a420306
fixed the onnx conversion without caching.
VahidooX May 18, 2022
232a2ad
fixed the bug.
VahidooX May 18, 2022
ac1a5b1
dropped onnx support.
VahidooX May 18, 2022
11fc147
cleaned.
VahidooX May 18, 2022
44579cc
cleaned.
VahidooX May 18, 2022
87f5ea3
cleaned.
VahidooX May 18, 2022
02a4591
cleaned.
VahidooX May 18, 2022
a52d483
cleaned.
VahidooX May 18, 2022
fe70bf1
adding docs
VahidooX May 19, 2022
2798fe6
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX May 19, 2022
2673650
adding docs
VahidooX May 19, 2022
a11961b
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX May 21, 2022
7e446a4
disbbled mlp for stacking.
VahidooX May 21, 2022
8ce2804
added back eval_streaming_onxx.py.
VahidooX May 23, 2022
467c399
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX May 24, 2022
eec50d4
added back eval_streaming_onxx.py.
VahidooX May 24, 2022
c93f772
cleaned.
VahidooX May 24, 2022
5a4fecd
cleaned.
VahidooX May 24, 2022
3ebd18a
fixed jenkins.
VahidooX May 24, 2022
0c481d2
fixed jenkins.
VahidooX May 24, 2022
b9eb558
fixed att_mask.
VahidooX May 24, 2022
af1f578
MADE valid_out_len a single number.
VahidooX May 24, 2022
0940a18
MADE valid_out_len a single number.
VahidooX May 24, 2022
2448484
changed valid_out_len to keep_all_outputs.
VahidooX Jun 2, 2022
735ab05
addressed comments.
VahidooX Jun 21, 2022
ff5d55f
addressed comments.
VahidooX Jun 21, 2022
53a628f
addressed comments.
VahidooX Jun 21, 2022
34e4922
pulled from main.
VahidooX Jun 21, 2022
b6fb106
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX Jun 21, 2022
c19f435
pulled from main.
VahidooX Jun 21, 2022
e80a727
fixed bugs.
VahidooX Jun 21, 2022
e73eabe
pulled from main.
VahidooX Jun 22, 2022
f8aa6a1
fixed style
VahidooX Jun 22, 2022
5881b46
Fix ctc decoding.
VahidooX Aug 1, 2022
0b00676
addressed comments.
VahidooX Aug 1, 2022
fc747b9
addressed comments.
VahidooX Aug 1, 2022
89a5d28
addressed comments.
VahidooX Aug 1, 2022
affbc73
fixed style. dropped extra export_forward.
VahidooX Aug 1, 2022
0e959a3
pulled from main.
VahidooX Aug 1, 2022
20d0b49
pulled from main.
VahidooX Aug 1, 2022
59b7c62
fixed online normazliation.
VahidooX Aug 1, 2022
b7eb7c8
fixed onnx conversion.
VahidooX Aug 2, 2022
d55cca9
fixed onnx conversion.
VahidooX Aug 2, 2022
a5be0bf
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX Aug 2, 2022
e30aa01
addrressed comments.
VahidooX Aug 2, 2022
c2cfe4e
fixed sampling.
VahidooX Aug 2, 2022
7589f88
fixed bug in depthwise conv.
VahidooX Aug 2, 2022
0bde720
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX Aug 2, 2022
090f838
fixed bug in depthwise conv.
VahidooX Aug 2, 2022
268a639
fixed bug in depthwise conv.
VahidooX Aug 3, 2022
fdcb3a1
cleaned docs.
VahidooX Aug 3, 2022
463aed6
cleaned docs.
VahidooX Aug 3, 2022
b5d8306
cleaned docs.
VahidooX Aug 3, 2022
194581f
Merge branch 'main' of https://github.com/NVIDIA/NeMo into casual_con…
VahidooX Aug 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -1504,7 +1504,7 @@ pipeline {
model.data_dir=/home/TestData/nlp/new_multiatis \
model.validation_ds.prefix=dev \
model.test_ds.prefix=dev \
trainer.gpus=[0] \
trainer.devices=[0] \
+trainer.fast_dev_run=true \
exp_manager.exp_dir=checkpoints2'
sh 'rm -rf checkpoints2'
Expand Down
56 changes: 56 additions & 0 deletions docs/source/asr/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,62 @@ You may find the example config files of Conformer-Transducer model with charact
``<NeMo_git_root>/examples/asr/conf/conformer/conformer_transducer_char.yaml`` and
with sub-word encoding at ``<NeMo_git_root>/examples/asr/conf/conformer/conformer_transducer_bpe.yaml``.

Streaming Conformer
-------------------

Streaming Conformer models are variants of Conformer which are trained with limited right context. It enables the model to be used very efficiently for frame-wise streaming.
Three categories of layers in Conformer have access to right tokens: 1-depthwise convolutions 2-self-attention, and 3-convolutions in downsampling layers.
Streaming Conformer models uses causal convolutions or convolutions with lower right context and also self-attention with limited right context to limit the effective right context for the input.
The model trained with such limitations can be used in streaming mode and give the exact same output and accuracy as when the whole audio is given to the model in offline mode.
These model can use caching mechanism to store and reuse the activations during streaming inference to avoid any duplications in the computations as much as possible.

We support the following three right context modeling:
* fully causal model with zero look-ahead: tokens would not see any future tokens. convolution layers are all causal and right tokens are masked for self-attention.
It gives zero latency but with limited accuracy.
To train such a model, you need to set `encoder.att_context_size=[left_context, 0]` and `encoder.conv_context_size=causal` in the config.

* regular look-ahead: convolutions would be able to see few future frames, and self-attention would also see the same number of future tokens.
In this approach the activations for the look-ahead part is not cached and recalculated in the next chunks. The right context in each layer should be a small number as multiple layers would increase the effective context size and then increase the look-ahead size and latency.
For example for a model of 17 layers with 4x downsampling and 10ms window shift, then even 2 right context in each layer means 17*2*10*4=1360ms look-ahead. Each step after the downsampling corresponds to 4*10=40ms.

* chunk-aware look-ahead: input is split into equal chunks. Convolutions are fully causal while self-attention layers would be able to see all the tokens in their corresponding chunk.
For example, in a model which chunk size of 20 tokens, tokens at the first position of each chunk would see all the next 19 tokens while the last token would see zero future tokens.
This approach is more efficient than regular look-ahead in terms of computations as the activations for most of the look-ahead part would be cached and there is close to zero duplications in the calculations.
In terms of accuracy, this approach gives similar or even better results in term of accuracy than regular look-ahead as each token in each layer have access to more tokens on average. That is why we recommend to use this approach for streaming.


** Note: Latencies are based on the assumption that the forward time of the network is zero.

Approaches with non-zero look-ahead can give significantly better accuracy by sacrificing latency. The latency can get controlled by the left context size.


In all modes, left context can be controlled by the number of tokens to be visible in the self-attention and the kernel size of the convolutions.
For example, if left context of self-attention in each layer is set to 20 tokens and there are 10 layers of Conformer, then effective left context is 20*10=200 tokens.
Left context of self-attention for regular look-ahead can be set as any number while it should be set as a multiplication of the right context in chunk-aware look-ahead.
For convolutions, if we use a left context of 30 in such model, then there would be 30*10=300 effective left context.
Left context of convolutions is dependent to the their kernel size while it can be any number for self-attention layers. Higher left context for self-attention means larger cache and more computations for the self-attention.
Self-attention left context of around 6 secs would give close result to have unlimited left context. For a model with 4x downsampling and shift window of 10ms in the preprocessor, each token corresponds to 4*10=40ms.

If striding approach is used for downsampling, all the convolutions in downsampling would be fully causal and don't see future tokens.
It is recommended to use stacking for streaming model which is significantly faster and uses less memory.

Conformer-Transducer is the Conformer model introduced in :cite:`asr-models-gulati2020conformer` and uses RNNT/Transducer loss/decoder.
It has the same encoder as Conformer-CTC but utilizes RNNT/Transducer loss/decoder which makes it an autoregressive model.

Most of the config file for Conformer-Transducer models are similar to Conformer-CTC except the sections related to the decoder and loss: decoder, loss, joint, decoding.
You may take a look at our `tutorials page <../starthere/tutorials.html>` on Transducer models to become familiar with their configs:
`Introduction to Transducers <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Intro_to_Transducers.ipynb>` and `ASR with Transducers <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_Transducers.ipynb>`
You can find more details on the config files for the Conformer-Transducer models at `Conformer-CTC <./configs.html#conformer-ctc>`.

This model supports both the sub-word level and character level encodings. The variant with sub-word encoding is a BPE-based model
which can be instantiated using the :class:`~nemo.collections.asr.models.EncDecRNNTBPEModel` class, while the
character-based variant is based on :class:`~nemo.collections.asr.models.EncDecRNNTModel`.

You may find the example config files of Conformer-Transducer model with character-based encoding at
``<NeMo_git_root>/examples/asr/conf/conformer/conformer_transducer_char.yaml`` and
with sub-word encoding at ``<NeMo_git_root>/examples/asr/conf/conformer/conformer_transducer_bpe.yaml``.


.. _LSTM-Transducer_model:

LSTM-Transducer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@
help="Model downsampling factor, 8 for Citrinet models and 4 for Conformer models",
)
parser.add_argument(
'--max_steps_per_timestep', type=int, default=5, help='Maximum number of tokens decoded per acoustic timestepB'
'--max_steps_per_timestep', type=int, default=5, help='Maximum number of tokens decoded per acoustic timestep'
)
parser.add_argument('--stateful_decoding', action='store_true', help='Whether to perform stateful decoding')
parser.add_argument('--device', default=None, type=str, required=False)
Expand Down Expand Up @@ -175,10 +175,10 @@ def main(args):
torch.set_grad_enabled(False)
if args.asr_model.endswith('.nemo'):
logging.info(f"Using local ASR model from {args.asr_model}")
asr_model = nemo_asr.models.EncDecCTCModelBPE.restore_from(restore_path=args.asr_model)
asr_model = nemo_asr.models.ASRModel.restore_from(restore_path=args.asr_model)
else:
logging.info(f"Using NGC cloud ASR model {args.asr_model}")
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name=args.asr_model)
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name=args.asr_model)

cfg = copy.deepcopy(asr_model._cfg)
OmegaConf.set_struct(cfg.preprocessor, False)
Expand Down