How to get single value for xi_hat? #15

olafthiele · 2019-12-17T23:27:05Z

Thanks for your work, we would like to test whether your approach works better than what we are currently using to detect "good" audio. We are inferring with
deepxi.py --infer 1 --out_type xi_hat --gain mmse-lsa

and get the mat files containing the output arrays. How do we interpret this data or do you see an easy function to boil it down to a single value?

The text was updated successfully, but these errors were encountered:

anicolson · 2019-12-17T23:33:06Z

Hi Olaf, there are many objective measures available that can rate the quality and intelligibility of speech, e.g. PESQ or STOI. Are you trying to evaluate speech or audio in general? PEAQ and its derivatives is probably what you are looking for then.

…

________________________________ From: Olaf Thiele <notifications@github.com> Sent: Wednesday, 18 December 2019 9:27 AM To: anicolson/DeepXi <DeepXi@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: [anicolson/DeepXi] How to get single value for xi_hat? (#15) Thanks for your work, we would like to test whether your approach works better than what we are currently using to detect "good" audio. We are inferring with deepxi.py --infer 1 --out_type xi_hat --gain mmse-lsa and get the mat files containing the output arrays. How do we interpret this data or do you see an easy function to boil it down to a single value? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#15?email_source=notifications&email_token=AGHGZ7QO672RTP775EWDPKTQZFN4VA5CNFSM4J4C67WKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IBGCF2A>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGHGZ7QSZGG4W2JD5SJ77VLQZFN4VANCNFSM4J4C67WA>.

olafthiele · 2019-12-17T23:54:07Z

Thanks, will look into that direction. But we are also interested in "eliminating" the noise and have tried your tool with some success. We are considering to transfer/retrain it with our own data as we are already using DeepSpeech and know what chunks are of good quality. But first, we would like to see how good the current model is. Haven't looked too deep into your code and were therefore wondering what to do with the mt files.

anicolson · 2019-12-20T04:25:25Z

Did I reply to this?

olafthiele · 2019-12-20T11:03:23Z

not yet :-) would be great, if you think a simple std variation of the included 257-item vectors yields sth useful

anicolson · 2019-12-20T11:30:08Z

You could simply use deepxi.py --infer 1 --out_type y --gain srwf

to save the enhanced speech .wav files, and then give them to DeepSpeech. This would be very easy to do.

A more complex alternative would be to include the enhanced speech magnitude spectrum produced by Deep Xi as part of the front-end of Deep Speech. Deep Speech utilises MFCCs as features, which are computed from the magnitude spectrum of the given wav file.

olafthiele · 2019-12-20T12:02:20Z

Thanks, we already tried that with mixed results. We would therefore try to find out what type of background noise your algo detects better. Therefore it would great to have some sort of measurement that shows how noisy your algo rates a certain chunk. Do you see a way to do that?

anicolson · 2019-12-21T06:11:56Z

With the audio that you are using, do you have a reference version? i.e. and ideal version, or a version without noise?

olafthiele · 2019-12-21T09:00:07Z

No, we have around 100 000 chunks and around a third are manually labelled as noisy with heavy or light noise labels. It would be great to see, whether your algo would label them the same way or where it differs. We could then label them automatically or clean them before feeding them to DeepSpeech to get better results

anicolson · 2019-12-21T09:05:05Z

you could use the a priori SNR in dB averaged over the frame to understand how much noise is in each time-region of a chunk, or averaged over the chunk if you just want to know the overall SNR of the chunk.

The overall SNR of the chunk could then be used as the label

olafthiele · 2019-12-21T09:51:35Z

Great, so if I understand you correctly, I could average the vector output in the mat files as each 257-element vector represents a 16 ms window. And the mat-values are the normalized db values. Is there any indication of what values are noisy or clean?

anicolson · 2019-12-21T09:59:10Z

So the window size is 32 ms, where the windows overlap by 16 ms. So there is a 32 ms window every 16 ms. The .mat file has the a priori SNR values. 10*log10( ) would give the a priori SNR values in dB. Averaging the 257 point vectors would give the average a priori SNR in dB for each of the frames. A value of 30 dB would indicate that the frame would be largely dominated by speech. A value of -10 dB would indicate that the frame is largely dominated by noise.

Hope this helps.

olafthiele · 2019-12-21T10:00:23Z

Perfect, thanks a lot mate and happy holidays

olafthiele closed this as completed Dec 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get single value for xi_hat? #15

How to get single value for xi_hat? #15

olafthiele commented Dec 17, 2019

anicolson commented Dec 17, 2019 via email

olafthiele commented Dec 17, 2019

anicolson commented Dec 20, 2019

olafthiele commented Dec 20, 2019

anicolson commented Dec 20, 2019

olafthiele commented Dec 20, 2019

anicolson commented Dec 21, 2019

olafthiele commented Dec 21, 2019

anicolson commented Dec 21, 2019 •

edited

olafthiele commented Dec 21, 2019

anicolson commented Dec 21, 2019

olafthiele commented Dec 21, 2019

How to get single value for xi_hat? #15

How to get single value for xi_hat? #15

Comments

olafthiele commented Dec 17, 2019

anicolson commented Dec 17, 2019 via email

olafthiele commented Dec 17, 2019

anicolson commented Dec 20, 2019

olafthiele commented Dec 20, 2019

anicolson commented Dec 20, 2019

olafthiele commented Dec 20, 2019

anicolson commented Dec 21, 2019

olafthiele commented Dec 21, 2019

anicolson commented Dec 21, 2019 • edited

olafthiele commented Dec 21, 2019

anicolson commented Dec 21, 2019

olafthiele commented Dec 21, 2019

anicolson commented Dec 21, 2019 •

edited