Skip to content

Finish local VAD integration preprocessing audio before being sent to deepgram/server #321

@josancamon19

Description

@josancamon19

Is your feature request related to a problem? Please describe.
To optimize bandwidth we should determine if there is voice on the current chunk of 30 seconds of audio bytes before being processed, this in order to save bandwidth.

Describe the solution you'd like
A clear and concise description of what you want to happen.

vad.dart


import 'dart:io';

import 'dart:async';
import 'package:flutter/services.dart';
import 'package:fonnx/models/sileroVad/silero_vad.dart';
import 'package:path_provider/path_provider.dart' as path_provider;
import 'package:path/path.dart' as path;

import 'package:flutter/foundation.dart';

class VadUtil {
  SileroVad? vad;
  dynamic hn;
  dynamic cn;


  init() async {
    final modelPath = await getModelPath('silero_vad.onnx');
    vad = SileroVad.load(modelPath);
  }

  Future<bool> predict(Uint8List bytes) async {
    if (vad == null) return true;
    final result = await vad!.doInference(bytes, previousState:  {
      'hn': hn,
      'cn': cn,
    });
    hn = result['hn'];
    cn = result['cn'];
    debugPrint('Result output: ${result['output'][0]}');
    return result['output'][0] > 0.1; // what's the right threshold?
  }

  Future<String> getModelPath(String modelFilenameWithExtension) async {
    if (kIsWeb) {
      return 'assets/$modelFilenameWithExtension';
    }
    final assetCacheDirectory = await path_provider.getApplicationSupportDirectory();
    final modelPath = path.join(assetCacheDirectory.path, modelFilenameWithExtension);

    File file = File(modelPath);
    bool fileExists = await file.exists();
    final fileLength = fileExists ? await file.length() : 0;

// Do not use path package / path.join for paths.
// After testing on Windows, it appears that asset paths are _always_ Unix style, i.e.
// use /, but path.join uses \ on Windows.
    final assetPath = 'assets/${path.basename(modelFilenameWithExtension)}';
    final assetByteData = await rootBundle.load(assetPath);
    final assetLength = assetByteData.lengthInBytes;
    final fileSameSize = fileLength == assetLength;
    if (!fileExists || !fileSameSize) {
      debugPrint('Copying model to $modelPath. Why? Either the file does not exist (${!fileExists}), '
          'or it does exist but is not the same size as the one in the assets '
          'directory. (${!fileSameSize})');
      debugPrint('About to get byte data for $modelPath');

      List<int> bytes = assetByteData.buffer.asUint8List(
        assetByteData.offsetInBytes,
        assetByteData.lengthInBytes,
      );
      debugPrint('About to copy model to $modelPath');
      try {
        if (!fileExists) {
          await file.create(recursive: true);
        }
        await file.writeAsBytes(bytes, flush: true);
      } catch (e) {
        debugPrint('Error writing bytes to $modelPath: $e');
        rethrow;
      }
      debugPrint('Copied model to $modelPath');
    }

    return modelPath;
  }
}

At transcript.dart this should be implemented before calling transcribeAudioFile2.

CleanShot 2024-06-20 at 11 14 05@2x

https://github.com/Telosnex/fonnx I've used this library for testing the vad.dart file, if you could help maybe with other library itd be better because I think this one needs license or smth, but faster the better, if this one works, it is okay.

Describe alternatives you've considered

Part 1: determine every 30 seconds if there's voice in the audio bytes, and if there's not ignore the transcribe audio request, and remove those 30 seconds of bytes.

Part 2: More optimized way of doing VAD, what I'm thinking of, process every 3 seconds of audio and ignore bytes without voice.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions