Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making predictions with MFCC/stored audio file #24

Closed
PeteSahad opened this issue Dec 2, 2021 · 17 comments
Closed

Making predictions with MFCC/stored audio file #24

PeteSahad opened this issue Dec 2, 2021 · 17 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@PeteSahad
Copy link

Hi,

I'm very new to the topics flutter and tensorflow. Just so you know that maybe some things I ask may not make any sense :).

I'm trying to build an app that allows me to record some audio samples. Then I would like to do some classification with the recorded files.

My questions are:

  • Is it possible to make a prediction with a recorded file instead of using the audio stream? (á la model.predict(data) like in python/tensorflow)
  • I'm using mfcc in my trained model. I expect that I would need to do some transformation with the recorded audio files to load them with the model (as I'm doing it in python). To which degree is that even possible with this plugin?

I hope you understand my problem.

Thanks in advance!

@Caldarie
Copy link
Owner

Caldarie commented Dec 3, 2021

Hi @PeteSahad

To answer your question:

1. Currently, the plugin has no feature to use recorded audio. However, it is possible to implement this on the plugin if you were to convert the audio file to PCM16. I had planned to implement this feature in the future, but haven’t had the time to do so. If you’d like to collaborate and take a shot at it, let me know,
This is now available on package 0.2.2+1

2. Currently, the plugin does not have the feature to convert audio files to MFCC. However, it is possible to implement this feature to this plugin. The problem is that it will take a much longer time to implement, especially if I was to work on this myself. I think you’re better off adding MFCC to the models pipeline, instead of manually extracting it with this plugin. More info here
This is now available on package 0.3.0 as an experimental feature

Let me know if my response answers your questions.

@PeteSahad
Copy link
Author

Hi Michael,

thanks for the very quick response!

As of right now I'm more or less toying around with how to perform the classification. The requirements are yet unclear whether I need stored audio files or do it on the fly. I'm rather looking around for potential solutions.

But if I wont find any other solutions it would absolutely make sense for me to base my further work on your solution. However, as I mentioned I'm pretty new to flutter/tensorflow so it would take some time to make some contibutions. Although I will have to implement a solution somehow :).

I'm currently rifling through your code to understand what you did and if I could continue with your work for my purpose.

I'll have to do some fundamental research first, since I just heared about mfcc for the first time yesterday, so I can't really assess your answer to 2) right now ;).

But thank you very much sofar!

@Caldarie
Copy link
Owner

Caldarie commented Dec 3, 2021

Hi @PeteSahad,

No problems at all.

If you have any questions, let me know. I would be happy to assist you if i can.

@Caldarie Caldarie added the enhancement New feature or request label Dec 9, 2021
@Caldarie
Copy link
Owner

Hey guys, just and update that I’m currently working on making predictions using stored audio. I will post an update once I get it to work.

@PeteSahad
Copy link
Author

awesome news!

I'm currently working on a different project but I'll get back to my audio project in a few weeks. I guess I'll definetly going to use that feature!

I was also thinking about the mfcc and feature extraction. I found another project which allows using librosa functionality in java -> JLibrosa. I'm not sure if right now if this project is further maintained, but I tried some stuff and seemed to work with little adjustments.
Maybe it is also interesting for your project.

@Caldarie
Copy link
Owner

Ah, thanks for the input. I had planned to transcribe the JLibrosa library to swift, but hadn't had the chance to do so. I may work on this feature once i'm done with my current project.

As for the loading audio feature, its a difficult one to implement (especially on android). You can find my progress here. Its on a different branch from master.

@Caldarie
Copy link
Owner

Caldarie commented Jan 7, 2022

Hi, I have published a new release 0.2.2+2. This should allow you to make inferences on stored audio.

If you do run on any bugs, please let me know.

@Caldarie
Copy link
Owner

Hi @PeteSahad,

I hope it’s not too much to ask, but I was wondering if you could help clarify with a concept, since you had some success with the Jlibrosa library.

I’ve been trying to wrap my head with MFCC at a deeper level, but could not figure out how to feed the spectrogram (extracted from the library) to the model. How did you manage it? What values did you use for your parameters? I.e mel_bin, hop_length etc.

@PeteSahad
Copy link
Author

Hi Caldarie,

unfortunately I didn't make it very far yet. Had to work on another project.

I used the following parameters:

Python
spectrogram = librosa.feature.melspectrogram(audio, sr=sample_rate, n_mels=128, n_fft=2048, hop_length=256) 
Java
float [][] melSpectrogram = jLibrosa.generateMelSpectroGram(audioFeatureValues, sampleRate, 2048, 128, 256);

However, I don't really use the spectogram when building the model. I just used the two functions above to verify that they create the same values - which they do. [I went step by step to figure out whats done in python and what the equivalent in java would be.]

I stopped at the next steps which are in python:

Python
mfccs_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
mfccs_scaled_features = np.mean(mfccs_features.T,axis=0)

in JLIbrosa it should be:

Java
float[][] mfccValues = jLibrosa.generateMFCCFeatures(audioFeatureValues, sampleRate, 40);
float[] meanMFCCValues = jLibrosa.generateMeanMFCCFeatures(mfccValues, mfccValues.length, mfccValues[0].length);

but I only get garbage data for meanMFCCValues.

My next step was to check what jLibrosa.generateMeanMFCCFeatures is doing exactly.

I'm not even sure if jLibrosa.generateMeanMFCCFeatures really is the java equivalent to pythons np.mean(mfccs_features.T,axis=0).

As I said, I'm very new to the topic so I might be on the wrong path.... but I'll start looking deeper into it the next days and weeks.

@PeteSahad
Copy link
Author

looked into it and figured, if you write garbage code - you get garbage data ;).

I now get the correct data for jLibrosa.generateMeanMFCCFeatures. I will now try to build the model in java with this data...

@Caldarie
Copy link
Owner

Hi @PeteSahad

Many thanks for sharing your information. Appreciate it.

Been trying to implement Mel-spectrogram or MFCC as an Input type for this plugin. Unfortunately, the information out there isn’t very straightforward with how to fit the spectrogram to the models input shape,

if you do come across such information, do let me know.

@Caldarie Caldarie changed the title Making predictions with a stored audio file Making predictions with MFCC/stored audio file Jan 13, 2022
@Caldarie Caldarie added the help wanted Extra attention is needed label Jan 13, 2022
@Caldarie
Copy link
Owner

Hi @PeteSahad

I would like to do a few tests for MFCC, but I don’t have a model with that input type. Would you be willing to share a model? Any model is fine, as long as it has MFCC as the input.

@PeteSahad
Copy link
Author

I only have this basic model for testing: https://drive.google.com/file/d/10ixguuoUKxsryu0MhNZcS19BqNNeOOWD/view?usp=sharing

It has two labels (cough/hiss) to distinguish between a hiss and a cough (who would have guessed...)

Input tensor is (1/40)

@Caldarie
Copy link
Owner

Caldarie commented Jan 17, 2022

hi @PeteSahad

Thanks for the model.

Just an update: Although i ran across some problems with rubbish outputs (NaN and Infinite), i had solved the problem by padding those values with real numbers.

Once, I figure out how to get iOS/swift running, I will publish an update.

@Caldarie
Copy link
Owner

Caldarie commented Mar 8, 2022

Hi @PeteSahad

Maybe a bit late, but if you are still interested in using MFCC, i have added the feature for both Android and iOS on this branch here.

If you have time, can you run a few tests with your own model and let me know if it works for you?

@bdytx5
Copy link

bdytx5 commented Mar 10, 2022

Hey guys,

As for using MFCC's in your plugin, I would not recommend doing this level of preprocessing outside of the model. This requires the same preprocessing code in the training step and inference step, which is difficult to do and may require continual refactoring... My recommendation is just to do the MFCC or any other preprocessing in the model itself, which is doable especially with keras and tensorflow, and is supported by tflite too!

@Caldarie
Copy link
Owner

@bdytx5 Thank you for the input.

Likewise, I concur with your recommendation.

For those who still wish to use MFCC, I have left this feature open in the new update 0.3.0. Be aware though that It's an experimental feature and may not produce the intended results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants