Issue with PyAudio / SpeechRecognition Integration #67

jhoelzl · 2016-09-14T09:38:19Z

Hello,

i am using Python 2.7.6 on Ubuntu 14.04 LTS.

I tried to integrate the aubio pitch detection into the SpeechRecognition module: jhoelzl/speech_recognition@355a952

However, the function pitch = pitch_o(signal)[0] (jhoelzl/speech_recognition@355a952#diff-873076ce119583cd8f8e749e2465a287R484) always returns

('Unexpected error:', <type 'exceptions.UnboundLocalError'>).

Does anybody has an idea or suggestion?

Thanks for support,

Regards,
Josef

The text was updated successfully, but these errors were encountered:

piem · 2016-09-14T10:15:58Z

Hi @jhoelzl ,

It seems you are overwriting aubio.pitch in pitch = pitch_o(signal)[0]. The following patch seems to help: t.diff.gz.

SpeechRecognition looks great. How are you planning to use aubio in it? I would be interested to know more!

Best, Paul

jhoelzl · 2016-09-14T10:55:36Z

Thank you very much @piem , now it works!

I am using the SpeechRecognition module for several months and it is indeed a handy tool.

However, voice activity detector in the SpeechRecognition module has to be improved. Currently, only the overall energy level in the frame is used as a measure (which is not very smart), so i have integrated the WebRTC VAD which measures the energy level in noise and speech band.

In addition, i also want to measure MFCC (added from python_speech_features) and pitch (added from your aubio module)

piem · 2016-09-14T11:02:05Z

ok, sounds great! Please make sure you check out the mfcc in aubio, and let me know how it is going.

For best results, a simple machine learning algorithm could be trained to discriminate speech / non-speech segments using (some of) these features (energy of each bands, mfcc, pitch, ...).

jhoelzl · 2016-09-14T11:23:23Z

Okay, i have also added the calculation for the MFCCs from your module (jhoelzl/speech_recognition@59a87bf), but i get this error:

('Unexpected error:', <type 'exceptions.ValueError'>)

when performing spec = p(signal).

piem · 2016-09-14T11:29:52Z

strange. when trying your latest git, I get this instead:

[...]
Say something!
Traceback (most recent call last):
  File "examples/microphone_recognition.py", line 11, in <module>
    audio = r.listen(source)
  File "/home/piem/projects/aubio/contrib/speech_recognition/speech_recognition/__init__.py", line 490, in listen
    spec = p(signal)
ValueError: input fvec has length 1024, but pvoc expects length 128

jhoelzl · 2016-09-14T11:48:39Z

Hi, yes it is working now, when i disable the WebRTC Vad module. The problem is, that the WebRTC VAD requires a frame size of 10, 20 or 30ms. I have 16kHz, so i have to set the variable source.CHUNK = 480 in my application to have a 30ms frame.

Then i have problems with the fft_size of the MFCC, because it has to be a power of two.

jhoelzl · 2016-09-14T11:54:35Z

Okay, when i set m_hop_s = 480 (not m_win_s // 4) then it works.

piem · 2016-09-14T11:56:15Z

Good! Yes, as long as the window size is a power of 2, it should work.

Note you could recompile aubio to use fftw3, but i'd recommend using power of 2 lengths for speed.

jhoelzl · 2016-09-19T14:12:22Z

I also added the zero-crossing-rate (ZCR):
jhoelzl/speech_recognition@03185a0
jhoelzl/speech_recognition@908ebb9

However, for the ZCR, i always get values smaller than 0.2.
I though this should be a positive integer value since it is defined as the number of times in a sound sample where the amplitude of the sound wave changes sign.

piem · 2016-09-19T14:30:26Z

hi @jhoelzl

ZRC is a rate, so you need to divide the number of sign changes by the total number of samples. Here is the doc in aubio and the actual code.

Note: the definition on wikipedia has a -1 normalisation offset and points to a different implementation.

hope this helps,
best, Paul

jhoelzl · 2016-09-19T14:41:04Z

@piem thanks now i understand

jhoelzl mentioned this issue Sep 14, 2016

Aubio with line-in/microphone in realtime from python ? #6

Closed

jhoelzl closed this as completed Sep 14, 2016

jhoelzl reopened this Sep 14, 2016

jhoelzl closed this as completed Sep 14, 2016

jhoelzl reopened this Sep 19, 2016

jhoelzl closed this as completed Sep 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with PyAudio / SpeechRecognition Integration #67

Issue with PyAudio / SpeechRecognition Integration #67

jhoelzl commented Sep 14, 2016 •

edited

Loading

piem commented Sep 14, 2016

jhoelzl commented Sep 14, 2016

piem commented Sep 14, 2016

jhoelzl commented Sep 14, 2016

piem commented Sep 14, 2016

jhoelzl commented Sep 14, 2016

jhoelzl commented Sep 14, 2016

piem commented Sep 14, 2016

jhoelzl commented Sep 19, 2016

piem commented Sep 19, 2016

jhoelzl commented Sep 19, 2016

Issue with PyAudio / SpeechRecognition Integration #67

Issue with PyAudio / SpeechRecognition Integration #67

Comments

jhoelzl commented Sep 14, 2016 • edited Loading

piem commented Sep 14, 2016

jhoelzl commented Sep 14, 2016

piem commented Sep 14, 2016

jhoelzl commented Sep 14, 2016

piem commented Sep 14, 2016

jhoelzl commented Sep 14, 2016

jhoelzl commented Sep 14, 2016

piem commented Sep 14, 2016

jhoelzl commented Sep 19, 2016

piem commented Sep 19, 2016

jhoelzl commented Sep 19, 2016

jhoelzl commented Sep 14, 2016 •

edited

Loading