# [SubsAI](https://github.com/abdeladim-s/subsai): Voice Activity Detection (VAD) example

If you have any issues, questions or suggestions, post a new issue [here](https://github.com/abdeladim-s/subsai/issues) or create a new discussion [here](https://github.com/abdeladim-s/subsai/discussions)

## This notebook shows how to use subsai with a VAD, in this example we have used [silero-vad](https://github.com/snakers4/silero-vad).

* ### The idea is to cut long media files into chunks using the VAD and process them one by one.
* ### The progress is saved with each processed chunk, so you can interrupt the process whenever you want.

# Dependencies

In [None]:
!apt install ffmpeg
!pip install jedi
!pip install git+https://github.com/abdeladim-s/subsai.git
!pip install -q torchaudio

# Imports

In [1]:
import os
import torch
from pathlib import Path
import pysubs2
from pysubs2 import SSAFile
import pickle
from subsai import SubsAI

Importing the dtw module. When using in academic works please cite:
  T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package.
  J. Stat. Soft., doi:10.18637/jss.v031.i07.



# Global vars

In [2]:
SAMPLING_RATE = 16000
media_file = '../assets/video/test0.webm'
subs_file = media_file + '.srt'
chunks_object_path = media_file + '-chunks.pkl'
transcription_model = 'ggerganov/whisper.cpp'
transcription_configs = {'model_type': 'base'}

# Load silero

In [None]:
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                              model='silero_vad',
                              force_reload=True,
                              onnx=False)

(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils

# Load Transcription model

In [None]:
subs_ai = SubsAI()
tr_model = subs_ai.create_model(transcription_model, transcription_configs)

# Magic

In [11]:
wav = read_audio(media_file, sampling_rate=SAMPLING_RATE)
if Path(chunks_object_path).exists():
    print("Loading stored chunks object ...")
    with open(chunks_object_path, 'rb') as chunk_file:
        speech_timestamps, chunk_idx_start = pickle.load(chunk_file)
else:
    print("No chunks object found, running silero ...")
    speech_timestamps = get_speech_timestamps(wav, model, return_seconds=False, sampling_rate=SAMPLING_RATE)
    chunk_idx_start = -1

# load subtitles file if it already exists
if Path(subs_file).exists():
    print(f"Loading subtitles file {subs_file}...")
    subs = pysubs2.load(subs_file)
else:
    subs = SSAFile()


for chunk_idx in range(chunk_idx_start+1, len(speech_timestamps)):
    try:
        ts = speech_timestamps[chunk_idx]
        print(f"Processing chunk {chunk_idx}...")
        start = ts['start']
        end = ts['end']
        # save file for transcription
        chunk_file = media_file + f"-chunk-{start}-{end}.wav"
        save_audio(chunk_file, collect_chunks(speech_timestamps[chunk_idx: chunk_idx+1], wav), sampling_rate=SAMPLING_RATE)
        # transcribe
        chunk_subs = subs_ai.transcribe(chunk_file, tr_model)
        # update time
        start_time =  round(start / SAMPLING_RATE, 1)
        for sub in chunk_subs:
            sub.start += start_time * 1000
            sub.end += start_time * 1000
        # update global_subs
        if not subs:
            subs = chunk_subs
        else:
            for sub in chunk_subs:
                subs.append(sub)
        # clean
        os.remove(chunk_file)
    except Exception as e:
        print(e)
    finally:
        # save subs file
        print(f"Saving subtitles file {subs_file} ...")
        subs.save(subs_file)
        # save chunk_object
        print(f"Saving chunks object ...")
        with open(chunks_object_path, 'wb') as chunk_file:
            # Pickle dictionary using protocol 0.
            pickle.dump((speech_timestamps, chunk_idx), chunk_file)
os.remove(chunks_object_path)
print("Done :)")

Loading stored chunks object ...
Loading subtitles file ../assets/video/test0.webm.srt...
Done :)
