[ASR] Fix for multi-channel signals in AudioSegment #4824

anteju · 2022-08-27T00:23:51Z

Signed-off-by: Ante Jukić ajukic@nvidia.com

What does this PR do ?

Currently, AudioSegment aims to do channel averaging for multi-channel audio signals:

if self._samples.ndim >= 2:
  self._samples = np.mean(self._samples, 1)

However, samples in its constructor are arranged as (num_channels, num_samples), instead of the assumed (num_samples, num_channels). This means that samples of the created AudioSegment will have an unexpected size, and also ASR will fail when running transcribe on a multi-channel audio signal.

This can be tested by running test_audiosegment.py and test_asr_multi_channel_audio.py from the attached scripts in test_audiosegment_with_channel_selector.zip

Collection: ASR

Changelog

Necessary fix in `AudioSegment`

correction to dimension ordering for multi-channel audio
added a new parameter channel_selector which can be used to select a single-channel (or a subset of channels) from a multi-channel audio signal
added unit tests for both channel selector and AudioSegment

Changes above are in segment.py and audio_utils.py and tests are in test_preprocessing_segment.py and test_audio_utils.py.

Nice to have: `channel_selector` for `transcribe`

Added a channel_selector option to transcribe, and forwarded it through models, audio-to-text dataset and WaveformFeaturizer to AudioSegment
This is a convenience when running an existing model on a multi-channel audio file, we can select either one of the channels or the channel-wise average.

If preferred, this can be removed from the PR.

Usage

Check the attached scripts: test_audiosegment_with_channel_selector.zip

Example: `AudioSegment`

Check test_audiosegment.py

AudioSegment.from_file(audio_file) # loads the complete signal (including all channels)
AudioSegment.from_file(audio_file, channel_selector='average') # loads a single-channel signal (average across channels)
AudioSegment.from_file(audio_file, channel_selector=0) # loads only first channel

Example: `AudioSegment`

Check test_asr_with_channel_selector.py

asr_model.transcribe(mc_files, channel_selector='average') # apply channel averaging when loading audio

Tests

Existing unit tests + newly-added unit tests
Compared transcription using single-channel signal matches between main and fix branch for wav, flac, mp3 input (test_asr_formats.py, attached)
Compared transcription of a channel from a multi-channel signal is matching transcription using a single-channel input signal (test_asr_with_channel_selector.py, attached)
Tutorials ASR_with_NeMo.ipynb and ASR_with_Transducers.ipynb running fine with the fix branch

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

n/a

lgtm-com · 2022-08-29T22:27:51Z

This pull request introduces 1 alert when merging 809c71a into d969162 - view on LGTM.com

new alerts:

1 for Unused import

jbalam-nv

Looks great

nemo/collections/asr/parts/utils/audio_utils.py

- Enable channel selector for AudioToText datasets for `transcribe` Signed-off-by: Ante Jukić <ajukic@nvidia.com>

- Enable channel selector for AudioToText datasets for `transcribe` Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>

- Enable channel selector for AudioToText datasets for `transcribe` Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>

anteju force-pushed the fix/audiosegment-multi-channel branch 3 times, most recently from 82f8122 to 809c71a Compare August 29, 2022 22:10

anteju force-pushed the fix/audiosegment-multi-channel branch 3 times, most recently from 03689d1 to 33c5053 Compare August 31, 2022 04:21

anteju requested review from titu1994 and jbalam-nv August 31, 2022 17:50

anteju force-pushed the fix/audiosegment-multi-channel branch 2 times, most recently from 4cf3633 to 27e7993 Compare August 31, 2022 18:04

yzhang123 previously approved these changes Sep 1, 2022

View reviewed changes

jbalam-nv reviewed Sep 2, 2022

View reviewed changes

nemo/collections/asr/parts/utils/audio_utils.py Outdated Show resolved Hide resolved

- Fix for multi-channel signals in AudioSegment

71759a3

- Enable channel selector for AudioToText datasets for `transcribe` Signed-off-by: Ante Jukić <ajukic@nvidia.com>

anteju dismissed yzhang123’s stale review via 71759a3 September 2, 2022 20:55

anteju force-pushed the fix/audiosegment-multi-channel branch from ad4cf71 to 71759a3 Compare September 2, 2022 20:55

anteju marked this pull request as ready for review September 2, 2022 21:01

anteju changed the title ~~[Draft] Fix for multi-channel signals in AudioSegment~~ [ASR] Fix for multi-channel signals in AudioSegment Sep 2, 2022

anteju mentioned this pull request Sep 2, 2022

[ASR] Generate multichannel noise #4870

Merged

6 tasks

anteju requested review from yzhang123, jbalam-nv and VahidooX September 6, 2022 16:58

yzhang123 approved these changes Sep 6, 2022

View reviewed changes

yzhang123 merged commit 760d0c8 into NVIDIA:main Sep 6, 2022

anteju mentioned this pull request Oct 25, 2022

Update perturb.py #5231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ASR] Fix for multi-channel signals in AudioSegment #4824

[ASR] Fix for multi-channel signals in AudioSegment #4824

anteju commented Aug 27, 2022 •

edited

Loading

lgtm-com bot commented Aug 29, 2022

jbalam-nv left a comment

[ASR] Fix for multi-channel signals in AudioSegment #4824

[ASR] Fix for multi-channel signals in AudioSegment #4824

Conversation

anteju commented Aug 27, 2022 • edited Loading

What does this PR do ?

Changelog

Necessary fix in AudioSegment

Nice to have: channel_selector for transcribe

Usage

Example: AudioSegment

Example: AudioSegment

Tests

Before your PR is "Ready for review"

Who can review?

Additional Information

lgtm-com bot commented Aug 29, 2022

jbalam-nv left a comment

Choose a reason for hiding this comment

anteju commented Aug 27, 2022 •

edited

Loading

Necessary fix in `AudioSegment`

Nice to have: `channel_selector` for `transcribe`

Example: `AudioSegment`

Example: `AudioSegment`