Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help with the inputs #35

Open
tamnguyenvan opened this issue May 22, 2022 · 14 comments
Open

Need help with the inputs #35

tamnguyenvan opened this issue May 22, 2022 · 14 comments

Comments

@tamnguyenvan
Copy link

Hi, I have a dumb question. My model receives outputs of librosa.load(audio_file, sr=16000) as inputs. How can I reproduce it with your code?

Thank you.

@Caldarie
Copy link
Owner

Caldarie commented May 22, 2022

Hmm, am I correct to assume that you want to upload an audio file and return an array of 16 bit values? If yes, you may need to edit the package to do so. For example on android, the code below returns spliced arrays of 16 bit values, and is then fed into to the model. If you just want the array, just edit the code this::startRecognition to your own function.

    public void preprocess(byte[] byteData) {
        Log.d(LOG_TAG, "Preprocessing audio file..");

        audioFile = new AudioFile(byteData, audioLength);
        audioFile.getObservable()
                .doOnComplete(() -> {
                    stopStream(); 
                    clearPreprocessing();
                })
                .subscribe(this::startRecognition);  //EDIT THIS CODE HERE
        audioFile.splice();
    }

If you want the package to take care of the recognition as well, all you need to do is evoke the code below from the TfliteAudio package. This should have the same effect as librosa.load(audio_file, sr=16000)

recognitionStream = TfliteAudio.startFileRecognition(
  sampleRate: 44100,
  audioDirectory: "assets/sampleAudio.wav",
  );

@SanaSizmic
Copy link

SanaSizmic commented Dec 14, 2022

Hi @Caldarie,
signal, sample_rate = librosa.load(file_path)

My model receives the signal of fixed 1sec duration and sample_rate 22050 as input,
I tested locally in python it gives the correct output, but when I try in flutter using flutter_tflite_audio it's giving incorrect output.

Can you please guide me on where & what should i change in the above "::startRecognition" code.

Thanks,

@Caldarie
Copy link
Owner

Caldarie commented Dec 14, 2022

Hi @SanaSizmic

Am I correct to assume that you want to load the audio file, and then output an array of float values?

in that case, you can simply modify the code below to:

subscribe(value -> print(value));

you might wanna double check on the syntax. Been awhile since I’ve touched Java.

@SanaSizmic
Copy link

Hi @Caldarie ,

When I tested locally using python, my model gives me [0.07594858, 0.9240514 ] predicted output which is the correct prediction, and for the same audio.wav file when I tested using flutter_tflite_audio it's giving [0.27258825, 0.72741175] output which is incorrect. Can you please suggest what should i change in flutter_tflite_audio package.

Thanks,

@Caldarie
Copy link
Owner

Hi @SanaSizmic

I see what you mean. I suspect that the float values are distorted from extraction, or the audio files are not spliced correctly.

If possible, can you compare the values from librosa.load and subscribe(value -> print(value));? And tell me whether they are similar to each other?

@SanaSizmic
Copy link

SanaSizmic commented Dec 15, 2022

Hi @Caldarie ,

No, it's not similar to each other.
when I print it, it gives sets of different arrays and every array generates a different output, I also think the same that the audio files are not spliced correctly.

Instead of splicing the audio file can I give the whole file to the model?
So can you please guide me on how can I fix this.

Thanks,

@Caldarie
Copy link
Owner

Caldarie commented Dec 15, 2022

Instead of splicing the audio file can I give the whole file to the model? So can you please guide me on how can I fix this

That really depends on your model. If the audio file has correct number of samples per second, then there’s no need to splice it.

No, it's not similar to each other. when I print it, it gives sets of different arrays and every array generates a different output, I also think the same that the audio files are not spliced correctly.

Take a look at the following code. You can test it to find errors.

From tfliteAudioPlugim.java, extraction data starts from here:

private byte[] extractRawData(AssetFileDescriptor fileDescriptor, long startOffset, long declaredLength) {
        Log.d(LOG_TAG, "Extracting byte data from audio file");

        MediaDecoder decoder = new MediaDecoder(fileDescriptor, startOffset, declaredLength);
        AudioProcessing audioData = new AudioProcessing();

        byte[] byteData = {};
        byte[] readData;
        while ((readData = decoder.readByteData()) != null) {
            byteData = audioData.appendByteData(readData, byteData);
            Log.d(LOG_TAG, "data chunk length: " + readData.length);
        }
        Log.d(LOG_TAG, "byte data length: " + byteData.length);
        return byteData;

    }

From AudioFile.java , conversion from byte to short starts here:

shortBuffer = ByteBuffer.wrap(byteData).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer();

For splicing, take a look at the code below. I have actually written some unit tests found here to test this algorithm. You are free to check it yourself for any problems

    public void splice() {
        isSplicing = true;

        for (int i = 0; i < shortBuffer.limit(); i++) {

            short dataPoint = shortBuffer.get(i);

            if (!isSplicing) {
                subject.onComplete();
                break;
            }

            switch (audioData.getState(i)) {
                case "append":
                    audioData
                        .append(dataPoint);
                break;
                case "recognise":
                    Log.d(LOG_TAG, "Recognising");
                    audioData
                        .append(dataPoint)
                        .displayInference()
                        .emit(data -> subject.onNext(data))
                        .reset();
                    break;
                case "finalise":
                    Log.d(LOG_TAG, "Finalising");
                    audioData
                        .append(dataPoint)
                        .displayInference()
                        .emit(data -> subject.onNext(data));
                    stop();
                    break;
                case "padAndFinalise":
                    Log.d(LOG_TAG, "Padding and finalising");
                    audioData
                        .append(dataPoint)
                        .padSilence(i)
                        .displayInference()
                        .emit(data -> subject.onNext(data));
                    stop();
                    break;
         
                default:
                    throw new AssertionError("Incorrect state when preprocessing");
            }
        }
    }

@SanaSizmic
Copy link

SanaSizmic commented Dec 27, 2022

Hi @Caldarie,

SAMPLES_TO_CONSIDER = 22050
signal, sample_rate = librosa.load(file_path)

if len(signal) >= SAMPLES_TO_CONSIDER:
    # ensure consistency of the length of the signal
    signal = signal[:SAMPLES_TO_CONSIDER]

else:
    signal = fix_length(signal, size=int(1*sample_rate), mode='edge')


# predictions = self.model.predict(signal)

Can I do this using flutter_tflite_audio
read the raw audio data convert it to the fixed sample rate length and predict.
Thanks,

@Caldarie
Copy link
Owner

Caldarie commented Jan 3, 2023

@SanaSizmic sorry for the late reply.

Yeah, you can absolutely follow something similar by editing the code in this plugin

@SanaSizmic
Copy link

Hi @Caldarie,
Can you please explain to me how the plugin works now? the structure, like First it takes the raw input signal array then splices to what length?
Or in order to do that (which I shared in the above code) which files do I have to edit,
If you can guide me that will be highly appreciated.
Thanks

@Caldarie
Copy link
Owner

Caldarie commented Jan 5, 2023

AS mentioned above, all you need to do is tweak the code provided below. The value returns an array of samples which you can use to implement the code you have provided

subscribe(value -> print(value));

@SanaSizmic
Copy link

SanaSizmic commented Jan 10, 2023

Hi, I have a dumb question. My model receives outputs of librosa.load(audio_file, sr=16000) as inputs. How can I reproduce it with your code?

Thank you.

Hi @tamnguyenvan, Did you manage to figure out this?

@SanaSizmic
Copy link

SanaSizmic commented Jan 10, 2023

Hi @Caldarie,

AS mentioned above, all you need to do is tweak the code provided below. The value returns an array of samples which you can use to implement the code you have provided

subscribe(value -> print(value));

do you mean this code in TfliteAudioPlugin.java file

public void preprocess(byte[] byteData) {
        Log.d(LOG_TAG, "Preprocessing audio file..");

        audioFile = new AudioFile(byteData, audioLength);
        audioFile.getObservable()
                .doOnComplete(() -> {
                    stopStream(); 
                    clearPreprocessing();
                })
                .subscribe(this::startRecognition);
        audioFile.splice();
    } 

This output.txt is my output file can you please check and let me know if anything is missing?

@Caldarie
Copy link
Owner

Hmm, everything seems to be in order. The question is whether it’s producing accurate results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants