Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Esperanto ASR example #5772

Merged
merged 16 commits into from
Jan 25, 2023
Merged

Conversation

andrusenkoau
Copy link
Collaborator

Signed-off-by: andrusenkoau andrusenkoau@gmail.com

What does this PR do ?

Adds ASR example for training Esperanto Conformer-CTC-large model.

Collection: ASR

Changelog

  • Adds Esperanto example to docs/source/asr/examples/

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
@andrusenkoau andrusenkoau added documentation Improvements or additions to documentation ASR labels Jan 11, 2023
Comment on lines 255 to 262
json.dumps(
{
"audio_filepath": sample["audio_filepath"],
"duration": sample["duration"],
"text": sample["sentence"],
},
ensure_ascii=False,
)
Copy link
Collaborator

@SeanNaren SeanNaren Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @titu1994 are we ok with enforcing this manifest path entry format into the main script? I.e just these fields get added to the manifest. This seems fine as all other manifests for ASR rely on this format but wanted to check if there are any edge cases. Currently in the main script, we just drop a few things from the sample and save which could introduce other parameters into the manifest.

This would allow us to merge the custom logic here into the main script (and remove this separate script).

andrusenkoau and others added 8 commits January 16, 2023 11:35
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
This reverts commit 87c08fb.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
This reverts commit 3960e8d.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
@andrusenkoau
Copy link
Collaborator Author

@SeanNaren thank you for fix suggestions. I have applied them.

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
@andrusenkoau andrusenkoau merged commit e626026 into NVIDIA:main Jan 25, 2023
Kipok pushed a commit to Kipok/NeMo that referenced this pull request Jan 31, 2023
* add experanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* add mcv_asr_dataset key for MCV ASR dataset

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* remove convert_hf_dataset_to_nemo_v2.py with code dublication

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add esperanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 87c08fb.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "add mcv_asr_dataset key for MCV ASR dataset"

This reverts commit 3960e8d.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply review suggestions

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
ericharper pushed a commit that referenced this pull request Jan 31, 2023
* add experanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* add mcv_asr_dataset key for MCV ASR dataset

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* remove convert_hf_dataset_to_nemo_v2.py with code dublication

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add esperanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 87c08fb.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "add mcv_asr_dataset key for MCV ASR dataset"

This reverts commit 3960e8d.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply review suggestions

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
ericharper pushed a commit that referenced this pull request Jan 31, 2023
* add experanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* add mcv_asr_dataset key for MCV ASR dataset

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* remove convert_hf_dataset_to_nemo_v2.py with code dublication

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add esperanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 87c08fb.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "add mcv_asr_dataset key for MCV ASR dataset"

This reverts commit 3960e8d.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply review suggestions

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Kipok pushed a commit to Kipok/NeMo that referenced this pull request Jan 31, 2023
* add experanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* add mcv_asr_dataset key for MCV ASR dataset

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* remove convert_hf_dataset_to_nemo_v2.py with code dublication

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add esperanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 87c08fb.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "add mcv_asr_dataset key for MCV ASR dataset"

This reverts commit 3960e8d.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply review suggestions

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
titu1994 pushed a commit to titu1994/NeMo that referenced this pull request Mar 24, 2023
* add experanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* add mcv_asr_dataset key for MCV ASR dataset

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* remove convert_hf_dataset_to_nemo_v2.py with code dublication

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add esperanto example

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 87c08fb.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Revert "add mcv_asr_dataset key for MCV ASR dataset"

This reverts commit 3960e8d.

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>

* Update esperanto_asr.rst

Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>

* Apply review suggestions

Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Signed-off-by: Andrei Andrusenko <52885736+andrusenkoau@users.noreply.github.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants