[ASR] Support for transcription of multi-channel audio for AED models #9007

anteju · 2024-04-23T00:20:13Z

What does this PR do ?

Currently, AED models do not use channel selector.
If the input manifest is pointing to multi-channel audio files, transcription will fail with

AssertionError: Expected MonoCut.

This PR adds support for

setting channel_selector to an integer value, e.g., 0/1 to always select the first/second channel in the input file
setting channel_selector to a string value to use a field from the input manifest to select the channel

Collection: ASR

Changelog

Propagate channel_selector in _setup_transcribe_dataloader
Added map with _select_channel in nemo/collections/common/data/lhotse/dataloader.py

Usage

Example of use, assuming Canary model is loaded

channel_selector = 0 # use the first channel
channel_selector = 1 # use the second channel
channel_selector = 'reference_channel' # use 'reference_channel' field from the manifest file

predicted_text = canary_model.transcribe(
      mc_manifest, # manifest with multi-channel audio
      batch_size=1,  # batch size to run the inference with
      channel_selector=channel_selector,  # channel
)

Similarly, it can be used with transcribe_speech.py

CHANNEL_SELECTOR=0
CHANNEL_SELECTOR=1
CHANNEL_SELECTOR=reference_channel # use 'reference_channel' field from the manifest file

python ${NEMO_ROOT}/examples/asr/transcribe_speech.py \
    pretrained_name=nvidia/canary-1b \
    dataset_manifest=./mc_manifest.json \
    channel_selector=${CHANNEL_SELECTOR}

Toy Example:

Scripts & toy dataset:

pr_example_01.tar.gz

Run as

# test transcribe_speech.py
bash test_transcribe.sh 0
bash test_transcribe.sh 1
bash test_transcribe.sh reference_channel

# test python API
python test_transcribe.py --channel-selector 0
python test_transcribe.py --channel-selector 1
python test_transcribe.py --channel-selector reference_channel

Jenkins CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

There's no need to comment jenkins on the PR to trigger Jenkins CI.
The GitHub Actions CI will run automatically when the PR is opened.
To run CI on an untrusted fork, a NeMo user with write access must click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to Support for multi-channel audio data #8728

anteju · 2024-04-23T17:19:47Z

jenkins

pzelasko

Looking good! Left two comments.

pzelasko · 2024-04-25T19:28:23Z

nemo/collections/common/data/lhotse/dataloader.py

+    # Apply channel selector
+    if config.channel_selector is not None:
+        logging.info('Using channel selector %s.', config.channel_selector)
+        cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None)


We should leave the default behavior of .map here, otherwise it might try to apply this to text-only examples.

Suggested change

cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None)

cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector))

pzelasko · 2024-04-25T19:30:44Z

nemo/collections/common/data/lhotse/dataloader.py

+            f"Channel index {channel_idx} is larger than the actual number of channels {cut.num_channels}"
+        )
+
+    return cut.with_channels(channel_idx)


hmmm .with_channels is only defined on MultiCut, perhaps we should add a check like:

if cut.num_channels == 1: return cut else: return cut.with_channels(channel_idx)

WDYT?

Makes sense, added this (will push in a bit).

pzelasko

LGTM

anteju · 2024-04-29T22:43:13Z

jenkins

…t_lhotse_dataloader_from config Signed-off-by: Ante Jukić <ajukic@nvidia.com>

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

…#9007) * Propagate channel selector for AED model + add channel selector to get_lhotse_dataloader_from config Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Included comments Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Added unit test Signed-off-by: Ante Jukić <ajukic@nvidia.com> --------- Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ao Tang <aot@nvidia.com>

…NVIDIA#9007) * Propagate channel selector for AED model + add channel selector to get_lhotse_dataloader_from config Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Included comments Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Added unit test Signed-off-by: Ante Jukić <ajukic@nvidia.com> --------- Signed-off-by: Ante Jukić <ajukic@nvidia.com>

anteju requested a review from pzelasko April 23, 2024 00:20

github-actions bot added ASR common labels Apr 23, 2024

anteju mentioned this pull request Apr 23, 2024

Support for multi-channel audio data #8728

Closed

anteju force-pushed the feature/asr-channel-selector-from-manifest branch 3 times, most recently from 1e7a297 to 5a7bc84 Compare April 23, 2024 17:19

anteju force-pushed the feature/asr-channel-selector-from-manifest branch from 5a7bc84 to 043bf0c Compare April 24, 2024 21:31

pzelasko requested changes Apr 25, 2024

View reviewed changes

anteju force-pushed the feature/asr-channel-selector-from-manifest branch from 9ed7e9b to 6fc9b7f Compare April 29, 2024 18:32

anteju requested a review from pzelasko April 29, 2024 18:32

anteju force-pushed the feature/asr-channel-selector-from-manifest branch 2 times, most recently from 8337553 to 8463836 Compare April 29, 2024 20:48

pzelasko approved these changes Apr 29, 2024

View reviewed changes

anteju added 3 commits April 30, 2024 09:36

Propagate channel selector for AED model + add channel selector to ge…

0986dbc

…t_lhotse_dataloader_from config Signed-off-by: Ante Jukić <ajukic@nvidia.com>

Included comments

27b6d6a

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

Added unit test

a8cfda6

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

anteju force-pushed the feature/asr-channel-selector-from-manifest branch from 8463836 to a8cfda6 Compare April 30, 2024 16:36

anteju added the Run CICD label Apr 30, 2024

anteju merged commit fe4b291 into NVIDIA:main Apr 30, 2024
126 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ASR] Support for transcription of multi-channel audio for AED models #9007

[ASR] Support for transcription of multi-channel audio for AED models #9007

anteju commented Apr 23, 2024

anteju commented Apr 23, 2024

pzelasko left a comment

pzelasko Apr 25, 2024

anteju Apr 29, 2024

pzelasko Apr 25, 2024

anteju Apr 29, 2024

pzelasko left a comment

anteju commented Apr 29, 2024

	cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None)
	cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector))

[ASR] Support for transcription of multi-channel audio for AED models #9007

[ASR] Support for transcription of multi-channel audio for AED models #9007

Conversation

anteju commented Apr 23, 2024

What does this PR do ?

Changelog

Usage

Toy Example:

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

anteju commented Apr 23, 2024

pzelasko left a comment

Choose a reason for hiding this comment

pzelasko Apr 25, 2024

Choose a reason for hiding this comment

anteju Apr 29, 2024

Choose a reason for hiding this comment

pzelasko Apr 25, 2024

Choose a reason for hiding this comment

anteju Apr 29, 2024

Choose a reason for hiding this comment

pzelasko left a comment

Choose a reason for hiding this comment

anteju commented Apr 29, 2024