[Fix] Use number of channels when calculating BAN #37

anteju · 2023-03-20T16:05:05Z

It seems the current implementation is missing a scaling by $M^{-1/2}$ when calculating BAN.
This results in a gain of $10 \log_{10} M~\text{dB}$, which sometimes results in clipping depending on $M$ and the input signal level.

Please refer to eq. (17) in Warsitz, Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition, 2007.

anteju · 2023-03-20T16:09:21Z

@boeddeker, possibly of interest -- this seems to apply to pb_* repos as well.

desh2608 · 2023-03-20T17:12:39Z

@boeddeker since you are the expert on this, I will defer to your opinion.

boeddeker · 2023-03-21T10:13:39Z

Yes, it is missing. It was already missing, when we translated the MATLAB code.
I checked it once, but with [1, 0, 0, ...] as beamformer, then the function works as expected, but when you change the beamformer to [1, 1, 1, ...], the scale is too large.

Depending on your application, you may want to think about a normalization before writing files to the disk.
For ASR we observed until now, always positive effects, hence we never had any issue with this scaling error, since we remove the scale afterward.

desh2608 · 2023-03-21T12:53:25Z

I will keep this PR open (for visibility). As pointed out by @boeddeker, it does not seem to impact ASR much. For the CHiME-7 DASR challenge, participants can choose whether or not they want to apply it in their system.

anteju · 2023-03-21T15:43:51Z

@boeddeker & @desh2608, I just wanted to let you know, up to you whether to include it or not.
Since you mentioned CHiME: processed audio is saved to fixed point format and some examples are clipped.
This likely does not impact ASR significantly. However, it is nevertheless incorrect.

desh2608 · 2023-03-21T15:59:31Z

Thanks for the heads up, in any case.

boeddeker · 2023-03-21T16:17:49Z

processed audio is saved to fixed point format and some examples are clipped.

This depends on how you dump the data to the disk. We use internally a normalization, before writing an audio file
(see paderbox.io.dump_audio) to minimize the quantization issue.

desh2608 · 2023-03-21T16:31:54Z

@popcornell could this explain some of the clipping issues you had observed, or were you able to resolve them?

popcornell · 2023-03-22T01:35:54Z

I think they will possibly still occur because there is also clipping in some arrays in CHiME-6.
Only way to prevent it is using peak normalization when the peak is outside [-1, 1].
But it also reduces dynamic range and also that could have an impact.

[Fix] Correct BAN scaling

2ad0251

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Use number of channels when calculating BAN #37

[Fix] Use number of channels when calculating BAN #37

anteju commented Mar 20, 2023 •

edited

Loading

anteju commented Mar 20, 2023 •

edited

Loading

desh2608 commented Mar 20, 2023

boeddeker commented Mar 21, 2023

desh2608 commented Mar 21, 2023

anteju commented Mar 21, 2023

desh2608 commented Mar 21, 2023

boeddeker commented Mar 21, 2023

desh2608 commented Mar 21, 2023

popcornell commented Mar 22, 2023 •

edited

Loading

[Fix] Use number of channels when calculating BAN #37

Are you sure you want to change the base?

[Fix] Use number of channels when calculating BAN #37

Conversation

anteju commented Mar 20, 2023 • edited Loading

anteju commented Mar 20, 2023 • edited Loading

desh2608 commented Mar 20, 2023

boeddeker commented Mar 21, 2023

desh2608 commented Mar 21, 2023

anteju commented Mar 21, 2023

desh2608 commented Mar 21, 2023

boeddeker commented Mar 21, 2023

desh2608 commented Mar 21, 2023

popcornell commented Mar 22, 2023 • edited Loading

anteju commented Mar 20, 2023 •

edited

Loading

anteju commented Mar 20, 2023 •

edited

Loading

popcornell commented Mar 22, 2023 •

edited

Loading