Skip to content

lex4all/lex4all

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lex4all: pronunciation LEXicons for Any Low-resource Language

A project of the Department of Computational Linguistics, Saarland University, Germany

Creators: Anjana Vakil & Max Paulus

Advisors: Alexis Palmer & Michaela Regneri

Contributors: Kayokwa Chibuye (University of Cape Town, South Africa)

Developers trying to incorporate speech recognition interfaces in a low-resource language (LRL) into their applications currently face the hurdle of not finding recognition engines trained on their target language. Although tools such as Carnegie Mellon University's Sphinx simplify the creation of new acoustic models for recognition, they require large amounts of training data (audio recordings) in the target language. However, for small-vocabulary applications, an existing recognizer for a high-resource language (HRL) can be used to perform recognition in the target language. This requires a pronunciation lexicon mapping the relevant words in the target language into sequences of sounds in the HRL.

lex4all is an easy-to-use desktop application for Windows that will allow even naive users to automatically create a pronunciation lexicon for words in any language, using a small number of audio recordings and a pre-existing recognition engine in a HRL such as English. The resulting lexicon can then be used to add small-vocabulary speech recognition functionality to applications in the LRL.

How it works

A simple user interface allows the user to easily specify one written form (text string) and and one or more audio samples (.wav files) for each word in the target vocabulary, and to set other options (e.g. number of pronunciations per word, name/save location of lexicon file, etc.). The audio is then passed to a speech recognition engine for a HRL (English). An automatic pronunciation generation algorithm (the Salaam method, [2–3]) finds the best pronunciation(s) for each word in the LRL vocabulary. The program outputs a pronunciation lexicon (.pls XML file). This lexicon file follows the standard pronunciation lexicon format (http://www.w3.org/TR/pronunciation-lexicon/), so it can be directly included in a speech recognition application, e.g. one built using the Microsoft Speech Platform API.

For a guided step-by-step walkthrough with screenshots, see: http://lex4all.github.io/lex4all/walkthrough.html

Features

  • Simple graphical interface
  • Use existing .wav audio files, or use the built-in audio recorder
  • Advanced options (number of pronunciations per word, discriminative training [3])
  • Evaluation module for testing/research
  • Built-in support for 5 source languages: German (de-DE), English (en-US), French (fr-FR), Japanese (ja-JP), Chinese (zh-CN)

Requirements & Installation

Requirements:

Installation:

  • Download the project from GitHub & unzip the archive.
  • Double-click the link run-lex4all.exe in the folder you just downloaded.
  • Enjoy using lex4all!

For troubleshooting help, please see our wiki page: https://github.com/lex4all/lex4all/wiki/Installation-&-troubleshooting

Backend & resources

This approach to language-independent recognition requires an existing high-quality speech recognition engine with a usable API; we chose to use the English recognition engine of the Microsoft Speech Platform, so lex4all is written in C#. The audio recording feature was built using the NAudio API.

To automatically discover the pronunciation mappings we implement the Salaam algorithm as presented in [2-3]; a slight modification was made to reduce the algorithm's running time. In addition to the basic discovery algorithm [2], users have the choice of applying the discriminative training algorithm [3] as well.

Publications

Anjana Vakil, Max Paulus, Alexis Palmer and Michaela Regneri. 2014. "lex4all: A language-independent tool for building and evaluating pronunciation lexicons for small-vocabulary speech recognition." In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014): System Demonstrations. [pdf]

Anjana Vakil and Alexis Palmer. 2014. "Cross-language mapping for small-vocabulary ASR in under-resourced languages: investigating the impact of source language choice." In: Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU'14). [pdf]

Chibuye, N.K., Rosenstock, T. and DeRenzi, B., 2018. "Cross-language Phoneme Mapping for Low-resource Languages: An Exploration of Benefits and Trade-offs." In: INTERSPEECH (pp. 2623-2627). [pdf]

References

[1] Jahanzeb Sherwani. 2009. “Speech interfaces for information access by low literate users”. PhD thesis. Pittsburgh, PA, USA: Carnegie Mellon University. [pdf].

[2] Fang Qiao, Jahanzeb Sherwani, and Roni Rosenfeld. 2010. “Small-vocabulary speech recognition for resource-scarce languages." In: Proceedings of the First ACM Symposium on Computing for Development (ACM DEV ’10). [pdf].

[3] Hao Yee Chan and Roni Rosenfeld. 2012. “Discriminative pronunciation learning for speech recognition for resource scarce languages." In: Proceedings of the 2nd ACM Symposium on Computing for Development (ACM DEV ’12). [pdf].

About

pronunciation LEXicons for Any Low-resource Language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published