Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GigaSpeech on HuggingFace #117

Open
dophist opened this issue Jun 27, 2022 · 0 comments
Open

GigaSpeech on HuggingFace #117

dophist opened this issue Jun 27, 2022 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@dophist
Copy link
Collaborator

dophist commented Jun 27, 2022

GigaSpeech dataset is now available on HuggingFace Hub.

Highlights of GigaSpeech on HuggingFace

  • easy to use (a two-liner in python)
  • Smoother and faster downloading from US & EU, even support on-the-fly downloading during training
  • preprocessed:
    • decompressed
    • short audio files(.wav) are segmented and extracted from raw long audio
    • supervisions are extracted from raw metadata.json
  • subsets can be downloaded separately (e.g. XS/S/M/L/XL for training, DEV/TEST for benchmarking)
  • users can even listen to audio samples via HuggingFace's dataset viewer

How-to

Useful links

Credits

Many thanks to The Dataset Team & Speech Team at HuggingFace, particularly @polinaeterna , @patrickvonplaten , @sanchit-gandhi , GigaSpeech just becomes more accessible to the entire speech community!

@dophist dophist added the documentation Improvements or additions to documentation label Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant