Skip to content

Latest commit

 

History

History
12 lines (6 loc) · 3.15 KB

README.md

File metadata and controls

12 lines (6 loc) · 3.15 KB

Speech-Recognition-system

Microsoft has placed an increasing emphasis on bringing speech technologies into mainstream usage. This focus has led to products such as Speech Server, which is used to implement speech-enabled telephony Systems, and speech Command, which allows users to control Windows Mobile devices using speech commands. So, it should come as no surprise that the speech team at Microsoft has been far from idle in the development of Windows Vista™. The strategy of coupling powerful speech technology with a powerful API has continued right through to Windows Vista. Windows Vista and later versions includes a built-in speech recognition user interface designed specifically for users who need to control Windows and enter text without using a keyboard or mouse. There is also a state- of the-art general purpose speech recognition engine. Not only is this an extremely accurate engine, but it's also available in a variety of languages. Windows Vista also includes the first of the new generation of speech synthesizers to come out of Microsoft, completely rewritten to take advantage of the latest techniques.

The Kinect’s microphone array collected the 24-bit audio data and pre-processed to remove the background noise and echoes of the signal using automatic echo cancellation (AEC) algorithm. This process increases the quality of the audio data. Noise reduction and echo cancellation can be done effectively with microphone array rather than a single microphone. When there is a multiple set of microphones, the time that sound arrives from an audio source to each microphone is slightly different. Audio data captured from Kinect microphones go to the preliminary processor and at that section Beam forming and Sound localization techniques determine the direction of the sound source and the set of microphones is used as a directional microphone. Each sound coming from an appropriate source (Beam) is split in to approximately 24 frames per second. Then these inputs go to the PC for further processing. Figure 3.3 shows implemented speech recognition method.

2

Speech module section has two parts which are Microsoft Speech Recognition Engine and Microsoft speech recognition grammar. Microsoft speech recognition engine matches vocal inputs with words and phrases, as defined by grammar rules. Two grammar rules exist in the speech recognition grammar. First is the simple rule which recognize small words and commands. Another rule can recognize and organize semantic content with various user accents. In this system, we had implemented the simple rule where the user needs to add the word into the command list and then the word will be initialized. Figure 3.4 and Appendix A shows the developed speech recognition system front panel and block diagram respectively. This system built by grammar builder and speech recognizer from the “System.Speech(4.0.0.0)” of NET Constructor with LabVIEW software . 

3