-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASR] Support for transcription of multi-channel audio for AED models #9007
[ASR] Support for transcription of multi-channel audio for AED models #9007
Conversation
1e7a297
to
5a7bc84
Compare
jenkins |
5a7bc84
to
043bf0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Left two comments.
# Apply channel selector | ||
if config.channel_selector is not None: | ||
logging.info('Using channel selector %s.', config.channel_selector) | ||
cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should leave the default behavior of .map
here, otherwise it might try to apply this to text-only examples.
cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector), apply_fn=None) | |
cuts = cuts.map(partial(_select_channel, channel_selector=config.channel_selector)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
f"Channel index {channel_idx} is larger than the actual number of channels {cut.num_channels}" | ||
) | ||
|
||
return cut.with_channels(channel_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm .with_channels
is only defined on MultiCut
, perhaps we should add a check like:
if cut.num_channels == 1:
return cut
else:
return cut.with_channels(channel_idx)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added this (will push in a bit).
9ed7e9b
to
6fc9b7f
Compare
8337553
to
8463836
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
jenkins |
…t_lhotse_dataloader_from config Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
8463836
to
a8cfda6
Compare
…#9007) * Propagate channel selector for AED model + add channel selector to get_lhotse_dataloader_from config Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Included comments Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Added unit test Signed-off-by: Ante Jukić <ajukic@nvidia.com> --------- Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ao Tang <aot@nvidia.com>
…NVIDIA#9007) * Propagate channel selector for AED model + add channel selector to get_lhotse_dataloader_from config Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Included comments Signed-off-by: Ante Jukić <ajukic@nvidia.com> * Added unit test Signed-off-by: Ante Jukić <ajukic@nvidia.com> --------- Signed-off-by: Ante Jukić <ajukic@nvidia.com>
What does this PR do ?
Currently, AED models do not use channel selector.
If the input manifest is pointing to multi-channel audio files, transcription will fail with
This PR adds support for
channel_selector
to an integer value, e.g.,0/1
to always select the first/second channel in the input filechannel_selector
to a string value to use a field from the input manifest to select the channelCollection: ASR
Changelog
channel_selector
in_setup_transcribe_dataloader
_select_channel
innemo/collections/common/data/lhotse/dataloader.py
Usage
Example of use, assuming Canary model is loaded
Similarly, it can be used with
transcribe_speech.py
Toy Example:
Scripts & toy dataset:
Run as
Jenkins CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
There's no need to comment
jenkins
on the PR to trigger Jenkins CI.The GitHub Actions CI will run automatically when the PR is opened.
To run CI on an untrusted fork, a NeMo user with write access must click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information