-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with PyAudio / SpeechRecognition Integration #67
Comments
Hi @jhoelzl , It seems you are overwriting SpeechRecognition looks great. How are you planning to use aubio in it? I would be interested to know more! Best, Paul |
Thank you very much @piem , now it works! I am using the SpeechRecognition module for several months and it is indeed a handy tool. However, voice activity detector in the SpeechRecognition module has to be improved. Currently, only the overall energy level in the frame is used as a measure (which is not very smart), so i have integrated the WebRTC VAD which measures the energy level in noise and speech band. In addition, i also want to measure MFCC (added from python_speech_features) and pitch (added from your aubio module) |
ok, sounds great! Please make sure you check out the mfcc in aubio, and let me know how it is going. For best results, a simple machine learning algorithm could be trained to discriminate speech / non-speech segments using (some of) these features (energy of each bands, mfcc, pitch, ...). |
Okay, i have also added the calculation for the MFCCs from your module (jhoelzl/speech_recognition@59a87bf), but i get this error:
when performing |
strange. when trying your latest git, I get this instead:
|
Hi, yes it is working now, when i disable the WebRTC Vad module. The problem is, that the WebRTC VAD requires a frame size of 10, 20 or 30ms. I have 16kHz, so i have to set the variable Then i have problems with the fft_size of the MFCC, because it has to be a power of two. |
Okay, when i set |
Good! Yes, as long as the window size is a power of 2, it should work. Note you could recompile aubio to use fftw3, but i'd recommend using power of 2 lengths for speed. |
I also added the zero-crossing-rate (ZCR): However, for the ZCR, i always get values smaller than 0.2. |
hi @jhoelzl ZRC is a rate, so you need to divide the number of sign changes by the total number of samples. Here is the doc in aubio and the actual code. Note: the definition on wikipedia has a -1 normalisation offset and points to a different implementation. hope this helps, |
@piem thanks now i understand |
Hello,
i am using Python 2.7.6 on Ubuntu 14.04 LTS.
I tried to integrate the aubio pitch detection into the SpeechRecognition module: jhoelzl/speech_recognition@355a952
However, the function
pitch = pitch_o(signal)[0]
(jhoelzl/speech_recognition@355a952#diff-873076ce119583cd8f8e749e2465a287R484) always returns('Unexpected error:', <type 'exceptions.UnboundLocalError'>)
.Does anybody has an idea or suggestion?
Thanks for support,
Regards,
Josef
The text was updated successfully, but these errors were encountered: