Finish local VAD integration preprocessing audio before being sent to deepgram/server

**Is your feature request related to a problem? Please describe.**
To optimize bandwidth we should determine if there is voice on the current chunk of 30 seconds of audio bytes before being processed, this in order to save bandwidth.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

vad.dart

```

import 'dart:io';

import 'dart:async';
import 'package:flutter/services.dart';
import 'package:fonnx/models/sileroVad/silero_vad.dart';
import 'package:path_provider/path_provider.dart' as path_provider;
import 'package:path/path.dart' as path;

import 'package:flutter/foundation.dart';

class VadUtil {
  SileroVad? vad;
  dynamic hn;
  dynamic cn;


  init() async {
    final modelPath = await getModelPath('silero_vad.onnx');
    vad = SileroVad.load(modelPath);
  }

  Future<bool> predict(Uint8List bytes) async {
    if (vad == null) return true;
    final result = await vad!.doInference(bytes, previousState:  {
      'hn': hn,
      'cn': cn,
    });
    hn = result['hn'];
    cn = result['cn'];
    debugPrint('Result output: ${result['output'][0]}');
    return result['output'][0] > 0.1; // what's the right threshold?
  }

  Future<String> getModelPath(String modelFilenameWithExtension) async {
    if (kIsWeb) {
      return 'assets/$modelFilenameWithExtension';
    }
    final assetCacheDirectory = await path_provider.getApplicationSupportDirectory();
    final modelPath = path.join(assetCacheDirectory.path, modelFilenameWithExtension);

    File file = File(modelPath);
    bool fileExists = await file.exists();
    final fileLength = fileExists ? await file.length() : 0;

// Do not use path package / path.join for paths.
// After testing on Windows, it appears that asset paths are _always_ Unix style, i.e.
// use /, but path.join uses \ on Windows.
    final assetPath = 'assets/${path.basename(modelFilenameWithExtension)}';
    final assetByteData = await rootBundle.load(assetPath);
    final assetLength = assetByteData.lengthInBytes;
    final fileSameSize = fileLength == assetLength;
    if (!fileExists || !fileSameSize) {
      debugPrint('Copying model to $modelPath. Why? Either the file does not exist (${!fileExists}), '
          'or it does exist but is not the same size as the one in the assets '
          'directory. (${!fileSameSize})');
      debugPrint('About to get byte data for $modelPath');

      List<int> bytes = assetByteData.buffer.asUint8List(
        assetByteData.offsetInBytes,
        assetByteData.lengthInBytes,
      );
      debugPrint('About to copy model to $modelPath');
      try {
        if (!fileExists) {
          await file.create(recursive: true);
        }
        await file.writeAsBytes(bytes, flush: true);
      } catch (e) {
        debugPrint('Error writing bytes to $modelPath: $e');
        rethrow;
      }
      debugPrint('Copied model to $modelPath');
    }

    return modelPath;
  }
}
```

At `transcript.dart` this should be implemented before calling `transcribeAudioFile2`.

![CleanShot 2024-06-20 at 11 14 05@2x](https://github.com/BasedHardware/Friend/assets/18078879/3c5166ff-53ed-4f06-9ac8-8f692ad2e3db)

https://github.com/Telosnex/fonnx I've used this library for testing the vad.dart file, if you could help maybe with other library itd be better because I think this one needs license or smth, but faster the better, if this one works, it is okay.

**Describe alternatives you've considered**

**Part 1:** determine every 30 seconds if there's voice in the audio bytes, and if there's not ignore the transcribe audio request, and remove those 30 seconds of bytes.

**Part 2:** More optimized way of doing VAD, what I'm thinking of, process every 3 seconds of audio and ignore bytes without voice.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions