add exportable mel spec #5512

1-800-BAD-CODE · 2022-11-27T19:15:16Z

Signed-off-by: shane carroll shane.carroll@utsa.edu

What does this PR do ?

AudioToMelSpectrogramPreprocessor accepts a bool argument use_torchaudio which, if True, switches the featurizer to a torchaudio-based extractor which produces the same features but with an exportable graph.

New preprocessor mimics the old implementation sufficiently well to swap out the preprocessor of pre-trained models and export them.

Preprocessor can be exported to JIT; ONNX is blocked by pytorch/pytorch#81075

Collection: ASR

Changelog

Add an option to AudioToMelSpectrogramPreprocessor which can alter featurizer.

Add a class FilterbankFeaturesTA which is analogous to FilterbankFeatures

Usage

The following script will

Load a pre-trained English Conformer
Create a copy of the model's preprocessor, but with a torchaudio backend
Compare new and old preprocessor outputs
Export the preprocessor to JIT
Compare JIT and PyTorch outputs
Swap the pre-trained model's preprocessor to the new one, check WER matches old

from pathlib import Path

import torch
import hydra
from pytorch_lightning import seed_everything
from omegaconf import open_dict

from nemo.utils import logging
from nemo.utils.nemo_logging import Logger
from nemo.collections.asr.metrics.wer import word_error_rate
from nemo.collections.asr.models import ASRModel
from nemo.collections.asr.modules.audio_preprocessing import AudioToMelSpectrogramPreprocessor


# Optionally use a seed
# seed_everything(42)

logging.set_verbosity(Logger.CRITICAL)

# Get a pre-trained ASR model to compare preprocessors. Any model with a mel spec extractor should work.
m: ASRModel = ASRModel.from_pretrained("stt_en_conformer_ctc_small", map_location=torch.device("cpu"))
m.eval()
old_preprocessor = m.preprocessor

# Extract preprocessor config and set the flag to use the torchaudio-based extractor; keep all other arguments the same
new_config = m.cfg.preprocessor
with open_dict(new_config):
    new_config.use_torchaudio = True
# Instantiate an instance that uses torchaudio on the backend
new_preprocessor: AudioToMelSpectrogramPreprocessor = hydra.utils.instantiate(config=new_config)
new_preprocessor.eval()
print(f"New preprocessor featurizer type: {type(new_preprocessor.featurizer)}")

# Export the torchaudio preprocessor and load it back in as a `ScriptModule`.
new_preprocessor.export("tmp.pt")
jit_preprocessor = torch.jit.load("tmp.pt")

# Generate random input
batch_size = 4
max_length = 16000
signals = torch.randn(size=[batch_size, max_length])
lengths = torch.randint(low=200, high=max_length, size=[batch_size])
lengths[0] = max_length

# Extract features with all preprocessors
old_feats, old_feat_lens = old_preprocessor(input_signal=signals, length=lengths)
new_feats, new_feat_lens = new_preprocessor(input_signal=signals, length=lengths)
jit_feats, jit_feat_lens = jit_preprocessor(input_signal=signals, length=lengths)

# Make sure new output matches old output
# Need to relax the tolerance from defaults. We will check WER also, as an alternative verification of correctness.
rel_tolerance = 1e-2
abs_tolerance = 1e-4
torch.testing.assert_allclose(actual=new_feats, expected=old_feats, atol=abs_tolerance, rtol=rel_tolerance)
# Zero tolerance for integer lengths.
torch.testing.assert_allclose(actual=new_feat_lens, expected=old_feat_lens, atol=0, rtol=0)

print(f"Output comparison passed with relative tolerance {rel_tolerance} and absolute tolerance {abs_tolerance}.")

# Make sure JIT output matches PyTorch output
# Keep tolerance at defaults for JIT comparison
torch.testing.assert_allclose(actual=jit_feats, expected=new_feats)
torch.testing.assert_allclose(actual=jit_feat_lens, expected=new_feat_lens, atol=0, rtol=0)
print(f"Jit comparison passed with default tolerance.")

# To run a WER check you'll need to comment out some assumptions in the CTC transcribe method, as addressed in https://github.com/NVIDIA/NeMo/pull/2762 
# print("Testing WER with old/new preprocessor with some LibriSpeech data")
# We only need audio files; we're comparing model outputs to each other, not references
# dev_other_dir = "/path/to/LibriSpeech/dev-other"
# num_files_to_use = 100
# audio_files = [str(x) for x in Path(dev_other_dir).rglob("*.flac")]
# audio_files = audio_files[:num_files_to_use]
# print("Transcribing with the baseline model")
# old_output = m.transcribe(audio_files)
# m.preprocessor = new_preprocessor
# print("Transcribing after switching the preprocessor")
# new_output = m.transcribe(audio_files)
# wer = word_error_rate(hypotheses=new_output, references=old_output)
# print(f"WER with {len(audio_files)} audio files using old vs. new preprocessor is {wer}")

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to: this issue comes up once in a while and is generally brushed aside.

Signed-off-by: shane carroll <shane.carroll@utsa.edu>

for more information, see https://pre-commit.ci

titu1994 · 2022-11-28T19:14:47Z

Thanks for your awesome pr ! I'll review it today

titu1994 · 2022-11-28T19:24:34Z

This is fantastic ! I'm wondering if we could simply subclass and override the methods of the older code but I don't think it's necessary. This is much cleaner, though it does support only a subset of the other featurizer.

I'll send a PR later today to add some unittests as you have sent above to ensure value between the two remains same

* add exportable mel spec Signed-off-by: shane carroll <shane.carroll@utsa.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: shane carroll <shane.carroll@utsa.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* add exportable mel spec Signed-off-by: shane carroll <shane.carroll@utsa.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: shane carroll <shane.carroll@utsa.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

add exportable mel spec

a2979db

Signed-off-by: shane carroll <shane.carroll@utsa.edu>

github-actions bot added the ASR label Nov 27, 2022

[pre-commit.ci] auto fixes from pre-commit.com hooks

30744fa

for more information, see https://pre-commit.ci

okuchaiev requested a review from titu1994 November 28, 2022 19:13

titu1994 approved these changes Nov 28, 2022

View reviewed changes

titu1994 merged commit 21b088b into NVIDIA:main Nov 28, 2022

msis mentioned this pull request Feb 3, 2023

FilterbankFeaturesTA to match FilterbankFeatures #5913

Merged

8 tasks

titu1994 mentioned this pull request Mar 10, 2023

Support pre-trained CTC models from NeMo k2-fsa/sherpa#332

Merged

3 tasks

This was referenced May 12, 2023

ONNX runtime for Nemo models NeonGeckoCom/streaming-stt-nemo#26

Merged

ONNX runtime for Nemo models (#26) NeonGeckoCom/streaming-stt-nemo#27

Merged

titu1994 mentioned this pull request Aug 23, 2023

ONNX model takes too long to run #7129

Closed

titu1994 mentioned this pull request Nov 3, 2023

preprocessor in exportables but not exported #7857

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add exportable mel spec #5512

add exportable mel spec #5512

1-800-BAD-CODE commented Nov 27, 2022

titu1994 commented Nov 28, 2022

titu1994 commented Nov 28, 2022

add exportable mel spec #5512

add exportable mel spec #5512

Conversation

1-800-BAD-CODE commented Nov 27, 2022

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

titu1994 commented Nov 28, 2022

titu1994 commented Nov 28, 2022