Common Voice

This is the web app for Mozilla Common Voice, a platform for collecting speech donations in order to create public domain datasets for training voice recognition-related tools.

Upcoming releases

Type	Release Cadence	More info
Platform code & sentences	Monthly, or as needed	Release notes
Dataset	Quarterly	Dataset metadata

Quick links

How to contribute

🎉 First off, thanks for taking the time to contribute! This project would not be possible without people like you. 🎉

There are many ways to get involved with Common Voice - you don't have to know how to code to contribute!

To add or correct the translation of the web interface, please use the Mozilla localization platform Pontoon. Please note, we do not accept any direct pull requests for changing localization content.
For information on how to add or edit sentences to Common Voice, see SENTENCES.md
For instructions on setting up a local development environment, see DEVELOPMENT.md
For information on how to add a new language to Common Voice, see LANGUAGE.md
For information on how to get in contact with existing language communities, see COMMUNITIES.md

For more general guidance on building your own language community using Mozilla voice tools, please refer to the Mozilla Voice Community Playbook.

Discussion

For general discussion (feedback, ideas, random musings), head to our Discourse Category.

For bug reports or specific feature, please use the GitHub issue tracker.

For live chat, join us on Matrix.

Licensing and content source

This repository is released under MPL (Mozilla Public License) 2.0.

The majority of our sentence text in /server/data comes directly from user submissions in our Sentence Collector or they are scraped from Wikipedia using our extractor tool, and are released under a CC0 public domain Creative Commons license.

Any files that follow the pattern europarl-VERSION-LANG.txt (such as europarl-v7-de.txt) were extracted with our thanks from the Europarl Corpus, which features transcripts from proceedings in the European parliament.

Citation

If you use the data in a published academic work we would appreciate if you cite the following article:

Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M. and Weber, G. (2020) "Common Voice: A Massively-Multilingual Speech Corpus". Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). pp. 4211—4215

The BiBTex is:

@inproceedings{commonvoice:2020,
  author = {Ardila, R. and Branson, M. and Davis, K. and Henretty, M. and Kohler, M. and Meyer, J. and Morais, R. and Saunders, L. and Tyers, F. M. and Weber, G.},
  title = {Common Voice: A Massively-Multilingual Speech Corpus},
  booktitle = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)},
  pages = {4211--4215},
  year = 2020
}

Cross Browser Testing

This project is tested with Browserstack

Name		Name	Last commit message	Last commit date
Latest commit History 30,247 Commits
.github		.github
bundler		bundler
common		common
docker		docker
docs		docs
locales		locales
maintenance		maintenance
scripts		scripts
server		server
web		web
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env-local-docker.example		.env-local-docker.example
.eslintignore		.eslintignore
.eslintrc.js		.eslintrc.js
.gitattributes		.gitattributes
.gitignore		.gitignore
.node-version		.node-version
.prettierignore		.prettierignore
.prettierrc		.prettierrc
Japanese-sentence-submission.txt		Japanese-sentence-submission.txt
LICENSE		LICENSE
README.md		README.md
contribute.json		contribute.json
docker-compose.yaml		docker-compose.yaml
l10n.toml		l10n.toml
package.json		package.json
renovate.json		renovate.json
tsconfig.base.json		tsconfig.base.json
tsconfig.eslint.json		tsconfig.eslint.json
yarn.lock		yarn.lock

License

common-voice/common-voice

Folders and files

Latest commit

History

Repository files navigation

Common Voice

Upcoming releases

Quick links

How to contribute

Discussion

Licensing and content source

Citation

Cross Browser Testing

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages