Counting specific sound occurances in the audio #13

nazdream · 2021-07-19T12:22:00Z

I am trying to count the number of specific sound occurrences in the audio

The problem I have is that I am calling TfliteAudio.startAudioRecognition and trying to listen to the events steam I am receiving events every 1 second. And I can't find the possibility to increase events' frequency to receive events every 50-100 ms. Is it possible to decrease interval duration to 50-100 ms?

Another problem I have is that event['recognitionResult'] always returns "1 Result":
result:
{hasPermission=true, inferenceTime=75, recognitionResult=1 Result}
However, there are more than 1 repetitions of sound I am trying to count in each 1-second interval. Should it work like this and what does number "1" means, is this number of the sound in a single audio interval or something else?

Is it possible to implement specific sound counting with this package or I should look somewhere else? Any feedback would be helpful, thanks!

The text was updated successfully, but these errors were encountered:

Caldarie · 2021-07-19T12:42:52Z

Hi @nazdream,

Looking at your description, it looks like you’re trying to build Sound Event Detection model. Correct me if I’m wrong here.

As for “ 1 Result”, i can check what’s wrong if you’re willing to share your label text file. Let me know if this is possible .

nazdream · 2021-07-19T12:45:39Z

Sure, 5 minutes please, I can upload the label and ts model to google drive and share access to it to this gmail: michaeltamthiennguyen@gmail.com. Will you be able to access it from there?

Caldarie · 2021-07-19T12:46:44Z

@nazdream no problems. It should be fine

nazdream · 2021-07-19T12:50:57Z

I have sent you the invite, can you check if you received it, please?

nazdream · 2021-07-19T12:56:38Z

Here is the code I used for testing the functionality I need:

import 'package:flutter/material.dart';
import 'package:tflite_audio/tflite_audio.dart';

class TestPage extends StatefulWidget {

  @override
  _TestPageState createState() => _TestPageState();
}

class _TestPageState extends State<TestPage> {
  bool _recording = false;
  int _results = 0;
  int _events = 0;
  Stream<Map<dynamic, dynamic>> _result;

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      body: ListView(
        children: [
          const SizedBox(height: 30),
          Center(
            child: Text('Audio'),
          ),
          const SizedBox(height: 30),
          Center(
            child: Container(
              width: 100,
              height: 100,
              decoration: BoxDecoration(
                borderRadius: BorderRadius.circular(100),
                color: _recording ? Colors.red : Colors.blue,
              ),
              child: Center(
                child: Text(_recording ? 'Recording...' : 'Idle'),
              ),
            ),
          ),
          const SizedBox(height: 30),
          Center(
            child: Text('Results: $_results'),
          ),
          Center(
            child: Text('Events: $_events'),
          ),
          const SizedBox(height: 30),
          Padding(
            padding: const EdgeInsets.symmetric(horizontal: 20),
            child: RaisedButton(
              onPressed: _recording ? _stop : _recorder,
              child: Text(_recording ? 'Stop' : 'Record'),
            ),
          ),
        ],
      ),
    );
  }

  void _recorder() {
    if (!_recording) {
      setState(() {
        _recording = true;
        _results = 0;
        _events = 0;
      });

      _result = TfliteAudio.startAudioRecognition(
        numOfInferences: 100,
        inputType: 'rawAudio',
        sampleRate: 44100,
        recordingLength: 44032,
        bufferSize: 22050,
        averageWindowDuration: 10,
        detectionThreshold: 0.6,
        suppressionTime: 10,
        minimumTimeBetweenSamples: 10,

      );
      
      _result.listen((event) {
        setState(() {
          _events ++;
        });
        if (event['recognitionResult'] == '1 Punch') {
          setState(() {
            _results ++;
          });
        }
      }).onDone(() {
        setState(() {
          _recording = false;
        });
      });
    }
  }

  void _stop() {
    TfliteAudio.stopAudioRecognition();
  }
}

And in the app file I am initializing the model:

void _loadTFModel() async {
    String result = await TfliteAudio.loadModel(
      label: 'assets/ml/labels.txt',
      model: 'assets/ml/soundclassifier.tflite',
      numThreads: 2,
      isAsset: true,
    );
  }

Caldarie · 2021-07-19T13:02:33Z

Ah, it seems you are using Google's Teachable Machine.

So, just to clarify a few more things, you want to reduce the recording length from 1000ms to around 50-100ms? Is that correct?

nazdream · 2021-07-19T13:04:03Z

Yes

nazdream · 2021-07-19T13:06:41Z

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect.
Something like this:
{hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

Caldarie · 2021-07-19T13:09:43Z

Yes

In that case, i think it may possible to do so. However, I have yet to test whether it works.

If you don't mind doing the testing for me, I suggest reducing the recordingLength to perhaps half, quarter or one eigth.

Let me know how it goes.

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect.
Something like this:
{hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

This may be possible, but i may need to change the source code around a bit to achieve this effect.

nazdream · 2021-07-19T13:11:47Z

If you don't mind doing the testing for me, I suggest reducing the recordingLength to perhaps half, quarter or one eight.

I will test this in a moment and provide results here

nazdream · 2021-07-19T13:36:45Z

I have tried reducing recordingLength by 2, 4 and 8 times, but the app is crashing every time with the following error:

E/AndroidRuntime(25618): FATAL EXCEPTION: Thread-7
E/AndroidRuntime(25618): Process: , PID: 25618
E/AndroidRuntime(25618): java.lang.IllegalArgumentException: Internal error: Failed to run on the given Interpreter: tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed.
E/AndroidRuntime(25618):
E/AndroidRuntime(25618): Node number 42 (MAX_POOL_2D) failed to prepare.
E/AndroidRuntime(25618):
E/AndroidRuntime(25618): at org.tensorflow.lite.NativeInterpreterWrapper.run(Native Method)
E/AndroidRuntime(25618): at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:204)
E/AndroidRuntime(25618): at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:374)
E/AndroidRuntime(25618): at org.tensorflow.lite.Interpreter.run(Interpreter.java:332)
E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin.rawAudioRecognize(TfliteAudioPlugin.java:508)
E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin.access$300(TfliteAudioPlugin.java:54)
E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin$4.run(TfliteAudioPlugin.java:452)
E/AndroidRuntime(25618): at java.lang.Thread.run(Thread.java:919)

I am using Xiaomi Redmi 7 for testing

Caldarie · 2021-07-19T13:54:52Z

Can you tell me your bufferSize? You need to match it or have it lower than the recording length.

nazdream · 2021-07-19T14:14:36Z

I was using 8000 buffer size

_result = TfliteAudio.startAudioRecognition(
        numOfInferences: 100,
        inputType: 'rawAudio',
        sampleRate: 44100,
        recordingLength: 22016,
        bufferSize: 2000,
        detectionThreshold: 0.6,
      );

nazdream · 2021-07-19T14:37:47Z

Just tried to use some other combinations of recordingLength and bufferSize but the app keeps crashing when I start recording audio with recordingLength != 44032. Let me know if you have any ideas on what could be the problem, please

Caldarie · 2021-07-19T14:40:47Z

Hi @nazdream,

I will take a look into the source code when i find some spare time. I cannot guarantee a fix, but i will keep you updated.

nazdream · 2021-07-19T14:48:50Z

Thanks!

Caldarie · 2021-07-20T08:40:09Z

Hi @nazdream,

After running some tests, i think an inference every 50-100ms (or 10 to 20 inferences per recording) will be extremely taxing for mobile devices. The best i can achieve is around 200ms for each inference, and this excludes latency and recording delays. Running an inference 10 to 20 times concurrently will cause a noticeable lag and can deteriorate the user's experience, i think.

What i can do however is increase the sensitivity of bufferSize so that the delay for each inference can be minimized; however, this approach will not have much impact either.

Let me know your thoughts about this.

nazdream · 2021-07-20T09:02:12Z

Hi @Caldarie ! Thanks for the response.
I think that 200ms will not work for me however I can try, maybe that will be enough. I ideally need some solution to be able to detect sound around every 100 ms or less. Is it technically possible to achieve something like this or typical device doesn't have enough resources to achieve such high frequency?

Caldarie · 2021-07-20T09:11:11Z

@nazdream I think it's very difficult to achieve your requirements with GTM models considering its a simple audio classification model.

However, i think it's possible to detect sound every 100ms if you built your own custom model. Though, this may require deep knowledge of machine learning. You will also need to modify this package as well to fit this custom model.

nazdream · 2021-07-20T09:18:19Z

How can inference every 50-100ms impact the device? I just need this sound detecting feature to work 30 seconds in a row max, so maybe if that will not impact the device that much.

Can you clarify if I understood you correctly that it's the GTM model which makes processing slow/ power consuming and the custom model can be much faster? Because we can create a custom GTM model if needed, however, I want to understand if this 200ms is predefined in the package code or it depends on the model and device benchmark and is unique for every combination of device and tf model?

Caldarie · 2021-07-20T09:23:38Z

As I've already explained in comment 13, it's a matter of processing power.

As for building a custom model, on second thought, it may also not be possible depending on what you want. Do you want real time results (inference every 100ms?), or do you want 20 results to appear after the recording is finished?

nazdream · 2021-07-20T09:25:13Z

I need real-time results

Caldarie · 2021-07-20T09:27:27Z

Ah, in that case, I think real time results for every 100ms may be next to impossible; even with custom models.

I may be wrong however. You might want to consult the developers for tflite to see if its possible.

nazdream · 2021-07-20T09:36:12Z

I will write them a letter and provide the answer from them when they reply.

@Caldarie Is it possible to detect the number of sounds in 1-second intervals, or it is only possible to count the number of sounds by splitting intervals into smaller intervals and check if there is a specific sound in each of them?

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect.
Something like this:
{hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

Caldarie · 2021-07-22T09:24:34Z

Apologies for the late reply.

Is it possible to detect the number of sounds in 1-second intervals

Yes, its possible if you develop your own model. With GTM, its impossible considering it only outputs one result at a time.

it is only possible to count the number of sounds by splitting intervals into smaller intervals and check if there is a specific sound in each of them?

As explained before, this approach is possible but not advisable.

nazdream · 2021-07-22T11:58:22Z

Do I understand that the model output is defined by the model itself and it doesn't depend on the package?

{hasPermission=true, inferenceTime=75, recognitionResult=1 Result}

Caldarie · 2021-07-22T12:13:39Z

@nazdream That is correct. You want train a model with multi label classification. Just be aware that this will only output multiple labels, not number of occurrences.

If you want to count the number of occurrences, you'll need to go deeper and train for a Sound Event Detection model. This one requires much deeper knowledge and time to train.

As for the package, I have yet to adapt it for models with multiple outputs. However, I am more than happy to adapt it for you; if you're willing to share your model.

Caldarie closed this as completed Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Counting specific sound occurances in the audio #13

Counting specific sound occurances in the audio #13

nazdream commented Jul 19, 2021

Caldarie commented Jul 19, 2021

nazdream commented Jul 19, 2021

Caldarie commented Jul 19, 2021

nazdream commented Jul 19, 2021

nazdream commented Jul 19, 2021

Caldarie commented Jul 19, 2021

nazdream commented Jul 19, 2021

nazdream commented Jul 19, 2021 •

edited

Caldarie commented Jul 19, 2021 •

edited

nazdream commented Jul 19, 2021 •

edited

nazdream commented Jul 19, 2021 •

edited

Caldarie commented Jul 19, 2021 •

edited

nazdream commented Jul 19, 2021 •

edited

nazdream commented Jul 19, 2021 •

edited

Caldarie commented Jul 19, 2021

nazdream commented Jul 19, 2021

Caldarie commented Jul 20, 2021 •

edited

nazdream commented Jul 20, 2021

Caldarie commented Jul 20, 2021 •

edited

nazdream commented Jul 20, 2021 •

edited

Caldarie commented Jul 20, 2021 •

edited

nazdream commented Jul 20, 2021

Caldarie commented Jul 20, 2021 •

edited

nazdream commented Jul 20, 2021 •

edited

Caldarie commented Jul 22, 2021

nazdream commented Jul 22, 2021

Caldarie commented Jul 22, 2021 •

edited

Counting specific sound occurances in the audio #13

Counting specific sound occurances in the audio #13

Comments

nazdream commented Jul 19, 2021

Caldarie commented Jul 19, 2021

nazdream commented Jul 19, 2021

Caldarie commented Jul 19, 2021

nazdream commented Jul 19, 2021

nazdream commented Jul 19, 2021

Caldarie commented Jul 19, 2021

nazdream commented Jul 19, 2021

nazdream commented Jul 19, 2021 • edited

Caldarie commented Jul 19, 2021 • edited

nazdream commented Jul 19, 2021 • edited

nazdream commented Jul 19, 2021 • edited

Caldarie commented Jul 19, 2021 • edited

nazdream commented Jul 19, 2021 • edited

nazdream commented Jul 19, 2021 • edited

Caldarie commented Jul 19, 2021

nazdream commented Jul 19, 2021

Caldarie commented Jul 20, 2021 • edited

nazdream commented Jul 20, 2021

Caldarie commented Jul 20, 2021 • edited

nazdream commented Jul 20, 2021 • edited

Caldarie commented Jul 20, 2021 • edited

nazdream commented Jul 20, 2021

Caldarie commented Jul 20, 2021 • edited

nazdream commented Jul 20, 2021 • edited

Caldarie commented Jul 22, 2021

nazdream commented Jul 22, 2021

Caldarie commented Jul 22, 2021 • edited

nazdream commented Jul 19, 2021 •

edited

Caldarie commented Jul 19, 2021 •

edited

nazdream commented Jul 19, 2021 •

edited

nazdream commented Jul 19, 2021 •

edited

Caldarie commented Jul 19, 2021 •

edited

nazdream commented Jul 19, 2021 •

edited

nazdream commented Jul 19, 2021 •

edited

Caldarie commented Jul 20, 2021 •

edited

Caldarie commented Jul 20, 2021 •

edited

nazdream commented Jul 20, 2021 •

edited

Caldarie commented Jul 20, 2021 •

edited

Caldarie commented Jul 20, 2021 •

edited

nazdream commented Jul 20, 2021 •

edited

Caldarie commented Jul 22, 2021 •

edited