-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get NWaves MFCC data similar to librosa #48
Comments
Hi! Also, this seems strange:
What is the sampling rate and duration (in seconds) of the signal? |
Thanks! I started from this page but no success.
sampling rate is 8000 |
So the number of samples should be, indeed, mfcc = librosa.feature.mfcc(y = audio, sr = 8000, n_mfcc=13, n_fft=1024, hop_length=int(np.floor(len(audio)/20)),
dct_type=2, norm='ortho', htk=False, fmin=0, center=False, n_mels=128, window='hanning') is equivalent to int sr = 8000; // sampling rate
int fftSize = 1024;
int filterbankSize = 128;
var melBank = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr);
int hopLength = <just specify here the value stored in _ int(np.floor(len(audio)/20)) _ >
var opts = new MfccOptions
{
SamplingRate = sr,
FrameDuration = (double)fftSize / sr,
HopDuration = (double)hopLength / sr,
FeatureCount = 13,
Filterbank = melBank,
NonLinearity = NonLinearityType.ToDecibel,
Window = WindowTypes.Hann,
LogFloor = 1e-10f,
DctType="2N",
LifterSize = 0
};
var extractor = new MfccExtractor(opts); Note. Set PS. Your |
Thank you very much for the example! I took this value from python debug code:
Also I set After all these I got What else could be wrong?
|
You need to find out why librosa returns 371499 samples. Because
Also, do you understand what the UPD. According to librosa docs
|
Thanks, I will try to find out |
I've already found out (see my previous comment):
Simply set: |
Thanks again for help!
|
You need to analyze the results more carefully. Compare them frame by frame. The values are very slightly different, and this is because of round-off errors. As we can see, the algorithm is implemented correctly. In the first frame of you signal (and many others as well) the first coeff seems very different, because the corresponding frame contains silence (sample values are very close to 0); essentially, in this case you have some big value in mfcc_0 and zeros in other coeffs (NWaves shows you 10e-5... 10e-7, but basically they are zeros); anyway, frames containing silence, most likely, will be discarded during feature analysis. Also, read more about:
|
Thank you very much for the details, I will investigate this! |
I wanted to post final solution and found errors in my code which might help others.
Librosa code:
NWaves code:
|
Hello!
I'm trying to get the same array of data in NWaves as in Librosa, I read and tried a lot of settings from wiki but results are not close to desired.
Initial line using librosa was:
mfcc = librosa.feature.mfcc(y = audio, sr = sr, n_fft = int(2048/2), hop_length = int(np.floor(len(audio)/20)), n_mfcc = 13)
With just these settings I got array of different length:
273 items in librosa from shape (13, 21)
1685 * 13 in NWaves
Audio length is 371499 in librosa (len(audio)), 134784 in NWaves. File is mono.
Then I tried to underestand which default parameters were used in librosa.
My results stayed the same with these list. By the way if fmax set to any value like sr/2 results are changing.
But anyway I didn't find all these parameters in NWaves.
Could you please advise how to achieve getting data in the same as librosa format?
The text was updated successfully, but these errors were encountered: