Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lhotse AudioToAudio dataset (supports ref recording and embedding) #8477

Merged
merged 22 commits into from Apr 16, 2024

Conversation

pzelasko
Copy link
Collaborator

@pzelasko pzelasko commented Feb 21, 2024

What does this PR do ?

We're adding several features to enable audio-to-audio dataloading:

  • Conversion script from NeMo audio-to-audio manifests to Lhotse format
  • Audio-to-audio Lhotse dataset (covers a2a, a2a with reference and a2a with embedding cases)
  • Auto-supports Lhotse Shar (tarred) format for audio-to-audio
  • Options to cut/truncate examples into smaller chunks on the fly

Collection: All speech collections

Changelog

  • Conversion script from NeMo audio-to-audio manifests to Lhotse format
  • Audio-to-audio Lhotse dataset (covers a2a, a2a with reference and a2a with embedding cases)
  • Auto-supports Lhotse Shar (tarred) format for audio-to-audio
  • Options to cut/truncate examples into smaller chunks on the fly

Usage

Generally the workflow to use this is as follows:

# First convert NeMo manifest to Lhotse:
python scripts/audio_to_audio/convert_nemo_to_lhotse.py \
  nemo_manifest.json \
  lhotse_manifest.jsonl.gz \
  -i input_filepath \
  -t target_filepath

# Then, convert to Lhotse Shar (tarred) format
# Meaning of args: 
#  -j4 is 4 jobs
# -s 100 is the shard size
# -c specifies custom fields we want to export and their format, see lhotse shar export --help
lhotse shar export \
  -j4 -v -s 100 \
  -a flac \
  -c target_recording:flac \
  lhotse.jsonl.gz \
  data_shar

Then you can instantiate the dataloader like the following:

from omegaconf import OmegaConf
from nemo.collections.common.data.lhotse import get_lhotse_dataloader_from_config
from nemo.collections.asr.data.audio_to_audio_lhotse import LhotseAudioToTargetDataset

config = {
   'shar_path': 'data_shar',
   'sample_rate': 16000,
   'batch_duration': 100,
   'use_bucketing': True,
   'num_buckets': 2,
}

dl = get_lhotse_dataloader_from_config(
   OmegaConf.create(config), global_rank=0, world_size=1, dataset=LhotseAudioToTargetDataset()
)

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

…edding)

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko pzelasko requested a review from anteju February 21, 2024 20:24
@github-actions github-actions bot added the ASR label Feb 21, 2024
pzelasko and others added 3 commits February 22, 2024 10:14
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
@pzelasko
Copy link
Collaborator Author

pzelasko commented Mar 6, 2024

Note: requires some fixes in lhotse, will bump the requirements as soon as I release it. EDIT: done

@github-actions github-actions bot added the common label Mar 7, 2024
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko pzelasko force-pushed the feature/lhotse-audio-to-audio-dataset branch from 844096f to aac3db3 Compare March 7, 2024 17:38
pzelasko and others added 3 commits March 7, 2024 12:39
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko pzelasko marked this pull request as ready for review March 7, 2024 19:40
@pzelasko
Copy link
Collaborator Author

pzelasko commented Mar 7, 2024

jenkins

@pzelasko pzelasko changed the title Draft for Lhotse AudioToAudio dataset (supports ref recording and embedding) Lhotse AudioToAudio dataset (supports ref recording and embedding) Mar 7, 2024
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko
Copy link
Collaborator Author

pzelasko commented Mar 8, 2024

jenkins

@anteju anteju mentioned this pull request Mar 9, 2024
8 tasks
…dataset (#8619)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Mar 27, 2024
Copy link
Contributor

github-actions bot commented Apr 3, 2024

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Apr 3, 2024
@pzelasko pzelasko reopened this Apr 5, 2024
Copy link
Collaborator

@anteju anteju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @pzelasko.

At this point, the main issue is that the prepared lhotse manifests use absolute paths.
It would be great if we could keep the relative path as in NeMo manifests.

nemo/collections/asr/data/audio_to_audio_lhotse.py Outdated Show resolved Hide resolved
anteju and others added 2 commits April 9, 2024 13:09
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko
Copy link
Collaborator Author

jenkins

…onverted manifests

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Copy link
Collaborator

@anteju anteju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

One change may be required in resolve_array in cutset to correctly resolve path.

nemo/collections/common/data/lhotse/cutset.py Outdated Show resolved Hide resolved
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko
Copy link
Collaborator Author

jenkins

@anteju anteju self-requested a review April 12, 2024 16:51
anteju
anteju previously approved these changes Apr 12, 2024
Copy link
Collaborator

@anteju anteju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @pzelasko

@pzelasko
Copy link
Collaborator Author

jenkins

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko
Copy link
Collaborator Author

jenkins

1 similar comment
@pzelasko
Copy link
Collaborator Author

jenkins

@pzelasko
Copy link
Collaborator Author

jenkins

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko
Copy link
Collaborator Author

jenkins

@anteju anteju self-requested a review April 16, 2024 16:52
@pzelasko
Copy link
Collaborator Author

Everything passed in jenkins, merging

@pzelasko pzelasko merged commit 12e7cf9 into main Apr 16, 2024
90 of 127 checks passed
@pzelasko pzelasko deleted the feature/lhotse-audio-to-audio-dataset branch April 16, 2024 17:55
xingyaoww pushed a commit to xingyaoww/NeMo that referenced this pull request Apr 23, 2024
…VIDIA#8477)

* Draft for Lhotse AudioToAudio dataset (supports ref recording and embedding)

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Integrate with speech enhancement models

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix absolute path + write cuts in the output manifest

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Support channel selectors for input, reference, and target recordings

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Support on the fly truncation and/or cutting into windows

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Bump min required lhotse version

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Add copyright headers

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Added unit tests checking lhotse dataloader is matching the existing dataset (NVIDIA#8619)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fix batch unpacking, test_ds, use nemo logging

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* fixed some code scanning issues

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fixed a couple CI issues

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Support NeMo-style resolution of relative paths in native lhotse cuts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Added option to leave original paths or force absolute paths in the converted manifests

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fix support for relative path resolution in lhotse arrays

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix unit tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
alxzhang-amazon pushed a commit to alxzhang-amazon/NeMo that referenced this pull request Apr 26, 2024
…VIDIA#8477)

* Draft for Lhotse AudioToAudio dataset (supports ref recording and embedding)

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Integrate with speech enhancement models

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix absolute path + write cuts in the output manifest

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Support channel selectors for input, reference, and target recordings

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Support on the fly truncation and/or cutting into windows

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Bump min required lhotse version

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Add copyright headers

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Added unit tests checking lhotse dataloader is matching the existing dataset (NVIDIA#8619)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fix batch unpacking, test_ds, use nemo logging

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* fixed some code scanning issues

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fixed a couple CI issues

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Support NeMo-style resolution of relative paths in native lhotse cuts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Added option to leave original paths or force absolute paths in the converted manifests

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fix support for relative path resolution in lhotse arrays

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix unit tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
suiyoubi pushed a commit that referenced this pull request May 2, 2024
…8477)

* Draft for Lhotse AudioToAudio dataset (supports ref recording and embedding)

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Integrate with speech enhancement models

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix absolute path + write cuts in the output manifest

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Support channel selectors for input, reference, and target recordings

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Support on the fly truncation and/or cutting into windows

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Bump min required lhotse version

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Add copyright headers

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Added unit tests checking lhotse dataloader is matching the existing dataset (#8619)

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fix batch unpacking, test_ds, use nemo logging

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* fixed some code scanning issues

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fixed a couple CI issues

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Support NeMo-style resolution of relative paths in native lhotse cuts

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Added option to leave original paths or force absolute paths in the converted manifests

Signed-off-by: Ante Jukić <ajukic@nvidia.com>

* Fix support for relative path resolution in lhotse arrays

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Fix unit tests

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: Ante Jukić <ajukic@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants