Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change recording length using GTM models to allow audio inputs greater than 1 second #8

Open
cmalbuquerque opened this issue Mar 12, 2021 · 4 comments

Comments

@cmalbuquerque
Copy link

I am using a GTM model and I am trying to increase the recording length passed to the model. To analysing 1 second of audio, I am using the following configurations:

      numOfInferences: 1,
      inputType: 'rawAudio',
      sampleRate: 44100,
      recordingLength: 44032,
      bufferSize: 22016,

Instead of analysing just 1 second, I want to increase the audio input to 3-5 seconds. Changing the recording length to 132 096 (3 x 44032), sampleRate to 132 300 (3 x 44100) and the bufferSize to half of recordingLength value, the inference crashes.

Is there anyway to record and send to the model an audio with more seconds knowing that GTM model's input requires a tensor input with 44032 size?

@Caldarie
Copy link
Owner

Caldarie commented Mar 12, 2021

Hi @cmalbuquerque

Unforunately, the recording length needs to be a fixed size. The good news however is that you can lengthen your recording by reducing your buffer size. For example:

  1. For a very long recording time, try recordingLength of 44032 and a bufferSize of 2000.
  2. For a moderate recoding time, try recordingLength of 44032 and a bufferSize of 8000
  3. For a very short recording time, try recordingLength of 44032 and a bufferSize of 22050

You may want to experiment on different bufferSizes to get the length you want.

Just be aware that it is difficult to get the exact seconds, as the recording times may differ from device to device. Also if you stretch the bufferSize to a very small value, it may adversely influence your inference accuracy.

@cmalbuquerque
Copy link
Author

cmalbuquerque commented Mar 12, 2021

@Caldarie nice, thanks!

I thought that I could only set the bufferSize value as half of recordingLength value... I decreased sample rate to 16kHz and set bufferSize to 2000 and I got approximately 3 seconds of audio... 44,1 kHz improve the accuracy however I believe if I use good audio samples to build the model with very distinct classes and train it, it will be able to get accurate inferences.

Thanks again! 😁

@Caldarie
Copy link
Owner

@cmalbuquerque Glad I could be of assistance.

You’re absolutely correct. It won’t really matter too much if you have distinct classes. Furthermore, if you listen closely, there’s not much of a difference between a sample rate of 16khz and 44.1khz.

@nazdream
Copy link

nazdream commented Jul 19, 2021

@Caldarie Hi!
I am trying to count the number of specific sounds in the audio stream.

  1. The problem I have is that I am calling TfliteAudio.startAudioRecognition and trying to listen to the events steam I am receiving events every 1 second. And I can't find the possibility to increase events' frequency to receive events every 50-100 ms. Is it possible to decrease interval duration to 50-100 ms?

  2. Another problem I have is that event['recognitionResult'] always returns "1 Result":
    result: {hasPermission=true, inferenceTime=75, recognitionResult=1 Result}
    However, there are more than 1 repetitions of sound I am trying to count in each interval. Should it work like this and what does number "1" means, is this number of the sound in a single audio interval or something else?

Is it possible to implement specific sound counting with this package or I should look somewhere else? Any feedback would be helpful, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants