-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using KS with different sampling rates #10
Comments
At the moment, there can only one framerate be used in the blackboard system. I see the following points:
Please discuss now so that we can conclude very soon, since we want to start a big range of new ADREAM scene model trainings and it would be good to do it with 16 kHz already, if we use that later. |
I would agree that 16 kHz is suitable for most applications. Although the localization models are trained with signals sampled at 44.1 kHz, they should still work at lower sampling frequencies because the ITD estimation incorporates an interpolation stage. |
For some of the stuff related with quality evaluation (for example prediction of coloration), we need definitely 44.1 kHz. I would prefer to use this in all our knowledge sources. But I can understand that it is maybe of advantage to use 16 kHz in most cases as testing and learning will then be faster in the DASA case. I have to think about this until tomorrow. |
We could incorporate a switch that would allows us to resample the input depending on the task. |
I would also like to work with 16kHz since everything is so much faster. Hagen, if prediction of coloration is the only scenario where 44.1kHz is needed, is it possible to use the switch available in AFE KS to specify 44.1kHz? Currently when we specify 16kHz in the AFE KS, the signal is downsampled in every block (here the block is what is defined in the binaural simulator, 4096 samples at 44.1KHz). It'd be nicer to accumulate signals to the desire length by a KS, say 0.5 second, then downsample it as a whole. |
I see there would be a benefit if we would allow for 16 kHz. First, we should summarize the current behavior and what is possible at what stage. I think @fietew should also be involved in the discussion. I will start with a few bullet points, it would be nice if everyone could add missing points: Current behavior (please fill in the answers to the questions):
Ideas for proposed behavior:
For the question of where we do the resampling: is there a performance difference in resampling in the binaural simulator or the auditory front-end, or did the use all the same Matlab function? What is the influence of block length on resampling performance? |
During the discussions for #14 @ningma97 pointed out that it doesn't matter for Does this mean for |
I tested >> localise
-------------------------------------------------------------------------
Source direction DnnLocationKS w head rot. DnnLocationKS wo head rot.
-------------------------------------------------------------------------
Error using -
Matrix dimensions must agree.
Error in DnnLocationKS/execute (line 95)
testFeatures = testFeatures - ...
Error in Scheduler/executeFirstExecutableAgendaOrderItem (line 63)
nextKsi.ks.execute();
Error in Scheduler/processAgenda (line 29)
[exctdKsi,cantExctKsis,~] = ...
Error in BlackboardSystem/run (line 217)
obj.scheduler.processAgenda();
Error in estimateAzimuth (line 18)
bbs.run();
Error in localise (line 37)
phi1 = estimateAzimuth(sim, 'BlackboardDnn.xml'); % DnnLocationKS w head movements |
DnnLocationKS uses crosscorrelation output which depending on the sampling rate has different lags, thus different feature dimensions. |
Ah, ok, that is was @Hardcorehobel ment with his comment:
Maybe we could implement this by incorporating the sampling frequency as a mandatory parameter when asking the AFE for crosscorrelation output (if this is not done yet). |
The sampling rate of the binaural simulator is constrained by the sample rate of the measured HRTF/BRTF datasets. The signals of the involved sound sources can be resampled while loading the respective *.wav using |
But you could still resample the output of the buffer, after convolution between signals and HRIR/BRIR? |
@Hardcorehobel : so far we've only used the default 80..8000 Hz range for our features. I guess you have put this as default, because for speech processing, the higher frequencies are not important. Is this, however, also true for the more general case of sound type detection? Now that I think about it, I believe humans hear up to 20 kHz, probably not without reason, and music is sampled with 44100 Hz to not loose information, right? |
Hah, and thinking more about it: the default 80..8000 Hz are the center frequencies of the filters, right? So what's the about range of the highest filter, then? If we sample only with 16 kHz, all information above 8 kHz is lost, so the highest filter probably would already loose information, correct? |
Hossa, indeed the frequency range determines the range for the center frequencies. Since the bandwidth of the filters increases with frequency, the filters cover a wide range at high frequencies. When using 44.1 kHz, of course one could go all the way up to Nyquist frequency. But the signal will be quite noisy in realistic conditions. Indeed one should avoid placing the highest filter directly at the nyquist frequency. Only if the number of filters is specified, filters will be placed at the lower and the upper frequency range. |
In order to come to some action points and conclusions on this point, I created TWOEARS/auditory-front-end#5 which should ensure that the output of the AFE is independent of the used sampling frequency (for example by returning time in seconds and not in samples). Are there more points were we should do something or change the behavior? |
Just to be clear: You are not requesting a resampling processor, but a representation in terms of time vector that is independent of the sampling frequency, right? |
Yes, I thought this would be the easiest solution to guarantee that different KS could work together that otherwise would require different sampling rates. On Mon Mar 14 11:03:26 2016 GMT+0100, Tobias May wrote:
|
I would like to close this issue. I guess our current solution is to use all KS with the same sampling rate, isn't it? |
Yes we specify the sampling rate in the AFE. |
Our two knowledge sources
DnnLocationKS
andGmmLocationKS
are both trained for a sampling rate of 16000 Hz at the moment and they are used with the following blackboard configuration:Most of our other knowledge sources were developed with 44100 Hz in mind. So my question is, will this be a problem? Is it possible to get the data with two different sampling rates from the auditory front-end in one blackboard? Or should we retrain the location knowledge sources to also use 44100 Hz?
/cc @ningma97 @chrschy @ivo--t
The text was updated successfully, but these errors were encountered: