-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASR] Fix for multi-channel signals in AudioSegment #4824
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
anteju
force-pushed
the
fix/audiosegment-multi-channel
branch
3 times, most recently
from
August 29, 2022 22:10
82f8122
to
809c71a
Compare
This pull request introduces 1 alert when merging 809c71a into d969162 - view on LGTM.com new alerts:
|
anteju
force-pushed
the
fix/audiosegment-multi-channel
branch
3 times, most recently
from
August 31, 2022 04:21
03689d1
to
33c5053
Compare
anteju
force-pushed
the
fix/audiosegment-multi-channel
branch
2 times, most recently
from
August 31, 2022 18:04
4cf3633
to
27e7993
Compare
yzhang123
previously approved these changes
Sep 1, 2022
jbalam-nv
reviewed
Sep 2, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great
- Enable channel selector for AudioToText datasets for `transcribe` Signed-off-by: Ante Jukić <ajukic@nvidia.com>
anteju
force-pushed
the
fix/audiosegment-multi-channel
branch
from
September 2, 2022 20:55
ad4cf71
to
71759a3
Compare
anteju
changed the title
[Draft] Fix for multi-channel signals in AudioSegment
[ASR] Fix for multi-channel signals in AudioSegment
Sep 2, 2022
6 tasks
yzhang123
approved these changes
Sep 6, 2022
jubick1337
pushed a commit
to jubick1337/NeMo
that referenced
this pull request
Oct 3, 2022
- Enable channel selector for AudioToText datasets for `transcribe` Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
Merged
hainan-xv
pushed a commit
to hainan-xv/NeMo
that referenced
this pull request
Nov 29, 2022
- Enable channel selector for AudioToText datasets for `transcribe` Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv
pushed a commit
to hainan-xv/NeMo
that referenced
this pull request
Nov 29, 2022
- Enable channel selector for AudioToText datasets for `transcribe` Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Ante Jukić <ajukic@nvidia.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Ante Jukić ajukic@nvidia.com
What does this PR do ?
Currently,
AudioSegment
aims to do channel averaging for multi-channel audio signals:However, samples in its constructor are arranged as
(num_channels, num_samples)
, instead of the assumed(num_samples, num_channels)
. This means thatsamples
of the createdAudioSegment
will have an unexpected size, and also ASR will fail when runningtranscribe
on a multi-channel audio signal.This can be tested by running
test_audiosegment.py
andtest_asr_multi_channel_audio.py
from the attached scripts in test_audiosegment_with_channel_selector.zipCollection: ASR
Changelog
Necessary fix in
AudioSegment
channel_selector
which can be used to select a single-channel (or a subset of channels) from a multi-channel audio signalChanges above are in
segment.py
andaudio_utils.py
and tests are intest_preprocessing_segment.py
andtest_audio_utils.py
.Nice to have:
channel_selector
fortranscribe
channel_selector
option totranscribe
, and forwarded it through models, audio-to-text dataset andWaveformFeaturizer
toAudioSegment
If preferred, this can be removed from the PR.
Usage
Example:
AudioSegment
test_audiosegment.py
Example:
AudioSegment
test_asr_with_channel_selector.py
Tests
main
andfix
branch for wav, flac, mp3 input (test_asr_formats.py
, attached)test_asr_with_channel_selector.py
, attached)ASR_with_NeMo.ipynb
andASR_with_Transducers.ipynb
running fine with thefix
branchBefore your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information