Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Perl Python Shell Matlab
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Introduction This is an always-listening distant speech multi-microphone interface mainly aiming to recognize commands for home automation control. It currently supports two languages: Greek and English, but it could very easily be adapted to any other language. It is mainly written in Python and has a GUI for recording control and results, based on python-qt4. It supports two speech recognition platforms: kaldi and htk. For recording and playback, python alsaaudio is utilized. The system is always-listening, meaning that it records speech but will not respond or recognize any commands unless it is first activated by a keyword, e.g. "sweet home listen" in English or "spitakimou" in Greek. If the system recognizes a keyword, then the command recognition module runs, waiting for a command to be spoken. The command recognition module searches inside a pre-defined grammar of commands, which is customized by the user. Apart from the keyword spotting and command recognition modules, the system has a couple of optional on-off components, such as voice activity detection, speaker localization and speech enhancement via beamforming (delay & sum or MVDR). An optional interface with Vera, a home automation controller, is also provided. Automatic speech recognition models for Greek and English are already provided in the package. You can download them for HTK-Greek and HTK-English and Kaldi-English from: http://cvsp.cs.ntua.gr/research/software/dngjnksndjgk/ Specifically for Greek, the model has been adapted with data from distant-speech databases, thus having matched train and test conditions. The system has been designed mainly for distant speech recognition, which most usually involves more than one microphone (microphone array). However, the number of channels is a tunable parameter and it can work for a single microphone as well. Note that the performance may not be optimal in this case. Regarding GUI, it contains one start and stop button to control the recording process, and also a space where system prompts appear, as well as the recognition results. Also, the user can see the waveform of the recorded speech real-time. When the system detects a keyword or a command, a relevant audio file with sythesized speech is being played back to the user. (Innoetics speech synthesis engines). Prerequisites for INSTALLATION The following should be installed and properly running on your system in order to be able to use the package: python-2.7 python-dev python-alsaaudio python-portaudio python-qt4 python-qt4-qwt5 python-matplotlib python-numpy python-scipy If you intend to use HTK, you should have the binaries: HVite, HCopy, HDecode, HERest, HParse and HHed and place them inside the bin/linux_x86_64 folder. If you intend to use kaldi, then you should have set up kaldi properly (including its dependencies). Kaldi is distributed via github. Installation instructions and more information on kaldi can be found at: http://kaldi-asr.org/doc/install.html Download kaldi at: https://github.com/kaldi-asr/kaldi