Disfluency Detection From Audio

This repo includes a demo for running audio through the language, acoustic, and multimodal disfluency detection models. It also includes preprocessing code for the Switchboard dataset, to align audio, transcripts, and disfluency labels (filled pauses, partial words, repetitions, revisions, and restarts) at the frame-level (every 20ms of audio).

Disfluency Detection Demo

Dependencies

The following packages are needed:

pandas==1.5.0
torch==1.12.1
torchaudio==0.12.1
transformers==4.22.2
whisper_timestamped==1.14.4
gdown==5.1.0

Use gdown to download the pretrained model weights and save to demo_models:

mkdir demo_models && cd demo_models
mkdir asr && cd asr
gdown --id 1BeT7m_5qv19Sb5yrZ2zhKu6fEprUoB9N -O config.json
gdown --id 15xQiVew2SatAL_7E5Hh30hya8x7tyGb_ -O pytorch_model.bin
cd ..
gdown --id 1GQIXgCSF3Usiuy5hkxgOl483RPX3f_SX -O language.pt
gdown --id 1wWrmopvvdhlBw-cL7EDyih9zn_IJu5Wr -O acoustic.pt
gdown --id 1LPchbScA_cuFx1XoNxpFCYZfGoJCfWao -O multimodal.pt

How to run the demo

Given some input.wav, we can run any of these options:

python3 demo.py --audio_file input.wav --output_file pred.csv --output_trans trans.csv --modality language

python3 demo.py --audio_file input.wav --output_file pred.csv --modality acoustic

python3 demo.py --audio_file input.wav --output_file pred.csv --modality multimodal

The language option runs a Whisper model that's been fine-tuned for verbatim transcription, and then uses the text + timestamps as input to a BERT model that's been fine-tuned for disfluency detection. The acoustic option runs a WavLM model that's been fine-tuned for acoustic-based disfluency detection. The multimodal option runs the language and acoustic models, concatenates their embeddings, and runs them through a BLSTM fusion model. The frame-level disfluency predictions will be printed to pred.csv. For the language option, ASR transcripts and word-level disfluency predictions will be printed to trans.csv.

Switchboard Preprocessing

Dependencies

The following packages are needed:

pandas
Levenshtein

Prepare the data as follows:

Get switchboard data through LDC:
- Copy audio sph files to raw_data folder (raw_data/swb_sph)
- Copy ms98 transcriptions to raw_data folder (raw_data/swb_ms98_transcriptions)
Copy corrected disfluency labels from Zayats et al. to raw_data folder:

cd raw_data
wget https://raw.githubusercontent.com/vickyzayats/switchboard_corrected_reannotated/master/switchboard_corrected_with_silver_reannotation.zip
unzip switchboard_corrected_with_silver_reannotation.zip
mv switchboard_corrected_with_silver_reannotation.tsv swb_silver.tsv
rm switchboard_corrected_with_silver_reannotation.zip

How to run the preprocessing code

python3 run_data_prep.py

This will create a data folder with

transcripts.csv: the text and word-level disfluency labels associated with each segment (FP, RP, RV, RS, PW)
wav_sil: a directory with the 8k wav files associated with each segment (50 ms silence padding on either end)
labels_framelevel: a directory with the frame-level labels (labels for every 20 ms)

Citation

This work has been submitted to IEEE Transactions on Audio, Speech and Language Processing. If you use this work in your research or projects, please cite it as follows:

@article{romana2023,
title = {Automatic Disfluency Detection from Untranscribed Speech},
author = {Amrit Romana, Kazuhito Koishida, Emily Mower Provost},
year = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
raw_data		raw_data
swb_preprocessing		swb_preprocessing
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
models.py		models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raw_data

raw_data

swb_preprocessing

swb_preprocessing

.gitignore

.gitignore

README.md

README.md

demo.py

demo.py

models.py

models.py

Repository files navigation

Disfluency Detection From Audio

Disfluency Detection Demo

Dependencies

How to run the demo

Switchboard Preprocessing

Dependencies

How to run the preprocessing code

Citation

About

Releases

Packages

Languages

amritkromana/disfluency_detection_from_audio

Folders and files

Latest commit

History

Repository files navigation

Disfluency Detection From Audio

Disfluency Detection Demo

Dependencies

How to run the demo

Switchboard Preprocessing

Dependencies

How to run the preprocessing code

Citation

About

Resources

Stars

Watchers

Forks

Languages