Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counting specific sound occurances in the audio #13

Closed
nazdream opened this issue Jul 19, 2021 · 27 comments
Closed

Counting specific sound occurances in the audio #13

nazdream opened this issue Jul 19, 2021 · 27 comments

Comments

@nazdream
Copy link

I am trying to count the number of specific sound occurrences in the audio

The problem I have is that I am calling TfliteAudio.startAudioRecognition and trying to listen to the events steam I am receiving events every 1 second. And I can't find the possibility to increase events' frequency to receive events every 50-100 ms. Is it possible to decrease interval duration to 50-100 ms?

Another problem I have is that event['recognitionResult'] always returns "1 Result":
result:
{hasPermission=true, inferenceTime=75, recognitionResult=1 Result}
However, there are more than 1 repetitions of sound I am trying to count in each 1-second interval. Should it work like this and what does number "1" means, is this number of the sound in a single audio interval or something else?

Is it possible to implement specific sound counting with this package or I should look somewhere else? Any feedback would be helpful, thanks!

@Caldarie
Copy link
Owner

Hi @nazdream,

Looking at your description, it looks like you’re trying to build Sound Event Detection model. Correct me if I’m wrong here.

As for “ 1 Result”, i can check what’s wrong if you’re willing to share your label text file. Let me know if this is possible .

@nazdream
Copy link
Author

Sure, 5 minutes please, I can upload the label and ts model to google drive and share access to it to this gmail: michaeltamthiennguyen@gmail.com. Will you be able to access it from there?

@Caldarie
Copy link
Owner

@nazdream no problems. It should be fine

@nazdream
Copy link
Author

I have sent you the invite, can you check if you received it, please?

@nazdream
Copy link
Author

Here is the code I used for testing the functionality I need:

import 'package:flutter/material.dart';
import 'package:tflite_audio/tflite_audio.dart';

class TestPage extends StatefulWidget {

  @override
  _TestPageState createState() => _TestPageState();
}

class _TestPageState extends State<TestPage> {
  bool _recording = false;
  int _results = 0;
  int _events = 0;
  Stream<Map<dynamic, dynamic>> _result;

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      body: ListView(
        children: [
          const SizedBox(height: 30),
          Center(
            child: Text('Audio'),
          ),
          const SizedBox(height: 30),
          Center(
            child: Container(
              width: 100,
              height: 100,
              decoration: BoxDecoration(
                borderRadius: BorderRadius.circular(100),
                color: _recording ? Colors.red : Colors.blue,
              ),
              child: Center(
                child: Text(_recording ? 'Recording...' : 'Idle'),
              ),
            ),
          ),
          const SizedBox(height: 30),
          Center(
            child: Text('Results: $_results'),
          ),
          Center(
            child: Text('Events: $_events'),
          ),
          const SizedBox(height: 30),
          Padding(
            padding: const EdgeInsets.symmetric(horizontal: 20),
            child: RaisedButton(
              onPressed: _recording ? _stop : _recorder,
              child: Text(_recording ? 'Stop' : 'Record'),
            ),
          ),
        ],
      ),
    );
  }

  void _recorder() {
    if (!_recording) {
      setState(() {
        _recording = true;
        _results = 0;
        _events = 0;
      });

      _result = TfliteAudio.startAudioRecognition(
        numOfInferences: 100,
        inputType: 'rawAudio',
        sampleRate: 44100,
        recordingLength: 44032,
        bufferSize: 22050,
        averageWindowDuration: 10,
        detectionThreshold: 0.6,
        suppressionTime: 10,
        minimumTimeBetweenSamples: 10,

      );
      
      _result.listen((event) {
        setState(() {
          _events ++;
        });
        if (event['recognitionResult'] == '1 Punch') {
          setState(() {
            _results ++;
          });
        }
      }).onDone(() {
        setState(() {
          _recording = false;
        });
      });
    }
  }

  void _stop() {
    TfliteAudio.stopAudioRecognition();
  }
}

And in the app file I am initializing the model:

void _loadTFModel() async {
    String result = await TfliteAudio.loadModel(
      label: 'assets/ml/labels.txt',
      model: 'assets/ml/soundclassifier.tflite',
      numThreads: 2,
      isAsset: true,
    );
  }

@Caldarie
Copy link
Owner

Ah, it seems you are using Google's Teachable Machine.

So, just to clarify a few more things, you want to reduce the recording length from 1000ms to around 50-100ms? Is that correct?

@nazdream
Copy link
Author

Yes

@nazdream
Copy link
Author

nazdream commented Jul 19, 2021

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect.
Something like this:
{hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

@Caldarie
Copy link
Owner

Caldarie commented Jul 19, 2021

Yes

In that case, i think it may possible to do so. However, I have yet to test whether it works.

If you don't mind doing the testing for me, I suggest reducing the recordingLength to perhaps half, quarter or one eigth.

Let me know how it goes.

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect.
Something like this:
{hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

This may be possible, but i may need to change the source code around a bit to achieve this effect.

@nazdream
Copy link
Author

nazdream commented Jul 19, 2021

If you don't mind doing the testing for me, I suggest reducing the recordingLength to perhaps half, quarter or one eight.

I will test this in a moment and provide results here

@nazdream
Copy link
Author

nazdream commented Jul 19, 2021

I have tried reducing recordingLength by 2, 4 and 8 times, but the app is crashing every time with the following error:

E/AndroidRuntime(25618): FATAL EXCEPTION: Thread-7
E/AndroidRuntime(25618): Process: , PID: 25618
E/AndroidRuntime(25618): java.lang.IllegalArgumentException: Internal error: Failed to run on the given Interpreter: tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed.
E/AndroidRuntime(25618):
E/AndroidRuntime(25618): Node number 42 (MAX_POOL_2D) failed to prepare.
E/AndroidRuntime(25618):
E/AndroidRuntime(25618): at org.tensorflow.lite.NativeInterpreterWrapper.run(Native Method)
E/AndroidRuntime(25618): at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:204)
E/AndroidRuntime(25618): at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:374)
E/AndroidRuntime(25618): at org.tensorflow.lite.Interpreter.run(Interpreter.java:332)
E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin.rawAudioRecognize(TfliteAudioPlugin.java:508)
E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin.access$300(TfliteAudioPlugin.java:54)
E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin$4.run(TfliteAudioPlugin.java:452)
E/AndroidRuntime(25618): at java.lang.Thread.run(Thread.java:919)

I am using Xiaomi Redmi 7 for testing

@Caldarie
Copy link
Owner

Caldarie commented Jul 19, 2021

Can you tell me your bufferSize? You need to match it or have it lower than the recording length.

@nazdream
Copy link
Author

nazdream commented Jul 19, 2021

I was using 8000 buffer size

_result = TfliteAudio.startAudioRecognition(
        numOfInferences: 100,
        inputType: 'rawAudio',
        sampleRate: 44100,
        recordingLength: 22016,
        bufferSize: 2000,
        detectionThreshold: 0.6,
      );

@nazdream
Copy link
Author

nazdream commented Jul 19, 2021

Just tried to use some other combinations of recordingLength and bufferSize but the app keeps crashing when I start recording audio with recordingLength != 44032. Let me know if you have any ideas on what could be the problem, please

@Caldarie
Copy link
Owner

Hi @nazdream,

I will take a look into the source code when i find some spare time. I cannot guarantee a fix, but i will keep you updated.

@nazdream
Copy link
Author

Thanks!

@Caldarie
Copy link
Owner

Caldarie commented Jul 20, 2021

Hi @nazdream,

After running some tests, i think an inference every 50-100ms (or 10 to 20 inferences per recording) will be extremely taxing for mobile devices. The best i can achieve is around 200ms for each inference, and this excludes latency and recording delays. Running an inference 10 to 20 times concurrently will cause a noticeable lag and can deteriorate the user's experience, i think.

What i can do however is increase the sensitivity of bufferSize so that the delay for each inference can be minimized; however, this approach will not have much impact either.

Let me know your thoughts about this.

@nazdream
Copy link
Author

Hi @Caldarie ! Thanks for the response.
I think that 200ms will not work for me however I can try, maybe that will be enough. I ideally need some solution to be able to detect sound around every 100 ms or less. Is it technically possible to achieve something like this or typical device doesn't have enough resources to achieve such high frequency?

@Caldarie
Copy link
Owner

Caldarie commented Jul 20, 2021

@nazdream I think it's very difficult to achieve your requirements with GTM models considering its a simple audio classification model.

However, i think it's possible to detect sound every 100ms if you built your own custom model. Though, this may require deep knowledge of machine learning. You will also need to modify this package as well to fit this custom model.

@nazdream
Copy link
Author

nazdream commented Jul 20, 2021

How can inference every 50-100ms impact the device? I just need this sound detecting feature to work 30 seconds in a row max, so maybe if that will not impact the device that much.

Can you clarify if I understood you correctly that it's the GTM model which makes processing slow/ power consuming and the custom model can be much faster? Because we can create a custom GTM model if needed, however, I want to understand if this 200ms is predefined in the package code or it depends on the model and device benchmark and is unique for every combination of device and tf model?

@Caldarie
Copy link
Owner

Caldarie commented Jul 20, 2021

As I've already explained in comment 13, it's a matter of processing power.

As for building a custom model, on second thought, it may also not be possible depending on what you want. Do you want real time results (inference every 100ms?), or do you want 20 results to appear after the recording is finished?

@nazdream
Copy link
Author

I need real-time results

@Caldarie
Copy link
Owner

Caldarie commented Jul 20, 2021

Ah, in that case, I think real time results for every 100ms may be next to impossible; even with custom models.

I may be wrong however. You might want to consult the developers for tflite to see if its possible.

@nazdream
Copy link
Author

nazdream commented Jul 20, 2021

I will write them a letter and provide the answer from them when they reply.

@Caldarie Is it possible to detect the number of sounds in 1-second intervals, or it is only possible to count the number of sounds by splitting intervals into smaller intervals and check if there is a specific sound in each of them?

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect.
Something like this:
{hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

@Caldarie
Copy link
Owner

Apologies for the late reply.

Is it possible to detect the number of sounds in 1-second intervals

Yes, its possible if you develop your own model. With GTM, its impossible considering it only outputs one result at a time.

it is only possible to count the number of sounds by splitting intervals into smaller intervals and check if there is a specific sound in each of them?

As explained before, this approach is possible but not advisable.

@nazdream
Copy link
Author

Do I understand that the model output is defined by the model itself and it doesn't depend on the package?

{hasPermission=true, inferenceTime=75, recognitionResult=1 Result}

@Caldarie
Copy link
Owner

Caldarie commented Jul 22, 2021

@nazdream That is correct. You want train a model with multi label classification. Just be aware that this will only output multiple labels, not number of occurrences.

If you want to count the number of occurrences, you'll need to go deeper and train for a Sound Event Detection model. This one requires much deeper knowledge and time to train.

As for the package, I have yet to adapt it for models with multiple outputs. However, I am more than happy to adapt it for you; if you're willing to share your model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants