This repository contains an R package which is an Rcpp wrapper around the webrtc Voice Activity Detection module.
example-vad.mp4
The package was created with as main goal to remove non-speech audio segments before doing an automatic transcription using audio.whisper to avoid transcription hallucinations. It contains
- functions to detect the location of voice in audio using a Gaussian Mixture Model implemented in webrtc
- functions to extract audio where there is voice / silence in a new audio file
- functionality to rewrite the timepoints of transcribed sentences where specific sections with non-audio are removed to make sure the timepoints of the transcriptions without silences align with the original audio signal
- The package is currently not on CRAN
- For the development version of this package:
remotes::install_github("bnosac/audio.vadwebrtc")
Look to the documentation of the functions: help(package = "audio.vadwebrtc")
Get a audio file in 16 bit with mono PCM samples (pcm_s16le codec) with a sampling rate of either 8Khz, 16KHz or 32Khz
library(audio.vadwebrtc)
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")
vad <- VAD(file, mode = "normal")
vad
Voice Activity Detection
- file: D:/Jan/R/win-library/4.1/audio.vadwebrtc/extdata/test_wav.wav
- sample rate: 16000
- VAD type: webrtc-gmm, VAD mode: normal, VAD by milliseconds: 10, VAD frame_length: 160
- Percent of audio containing a voiced signal: 90.2%
- Seconds voiced: 6.3
- Seconds unvoiced: 0.7
vad$vad_segments
vad_segment start end has_voice
1 0.00 0.08 FALSE
2 0.09 3.30 TRUE
3 3.31 3.71 FALSE
4 3.72 6.78 TRUE
5 6.79 6.99 FALSE
Example of a simple plot of these audio and voice segments
library(av)
x <- read_audio_bin(file)
plot(seq_along(x) / 16000, x, type = "l", xlab = "Seconds", ylab = "Signal")
abline(v = vad$vad_segments$start, col = "red", lwd = 2)
abline(v = vad$vad_segments$end, col = "blue", lwd = 2)
Or show it interactively alongside R package wavesurfer: wavesurfer
library(wavesurfer)
library(shiny)
file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")
vad <- VAD(file, mode = "lowbitrate")
anno <- data.frame(audio_id = vad$file,
region_id = vad$vad_segments$vad_segment,
start = vad$vad_segments$start,
end = vad$vad_segments$end,
label = ifelse(vad$vad_segments$has_voice, "Voiced", "Silent"))
anno <- subset(anno, label %in% "Silent")
wavs_folder <- system.file(package = "audio.vadwebrtc", "extdata")
shiny::addResourcePath("wav", wavs_folder)
ui <- fluidPage(
wavesurferOutput("my_ws", height = "128px"),
tags$p("Press spacebar to toggle play/pause."),
)
server <- function(input, output, session) {
output$my_ws <- renderWavesurfer({
wavesurfer(audio = paste0("wav/", "test_wav.wav"), annotations = anno) %>%
ws_set_wave_color('#5511aa') %>%
ws_cursor()
})
}
shinyApp(ui = ui, server = server)
Need support in text mining? Contact BNOSAC: http://www.bnosac.be