Skip to content

bma-vandijk/asr_pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASR Evaluation Pipeline for Clinical Applications with Older Adults

This repository contains the code and evaluation pipeline for the research paper: "Out of the Box, into the Clinic? Evaluating State-of-the-Art ASR for Clinical Applications for Older Adults".

Overview

This pipeline evaluates state-of-the-art Automatic Speech Recognition (ASR) models on Dutch speech from older adults, comparing their performance on both clinical conversation data (Welzijn.AI chatbot interactions) and general speech data (Mozilla Common Voice). The evaluation focuses on accuracy-speed trade-offs and model generalization capabilities.

What This Pipeline Does

  1. Audio Processing: Converts and segments audio files, performs speaker diarization to separate different speakers
  2. ASR Evaluation: Tests multiple ASR models including:
    • Generic multilingual models (Whisper variants, Voxtral)
    • Dutch-specific models (wav2vec2-xls-r-1b-dutch-3, whisper-native-elderly-9-dutch)
  3. Performance Analysis: Computes Word Error Rate (WER) and processing time metrics
  4. Comparative Study: Evaluates models on two datasets:
    • Welzijn.AI (clinical conversations with older adults) - referred to as "Beatrix" in the code
    • Mozilla Common Voice (general Dutch speech of older adults)

Key Findings

  • Generic multilingual models often outperform fine-tuned models
  • Model truncation helps balance accuracy-speed trade-offs
  • Some models show high WER due to hallucinations and mishearings

Data Privacy

Due to privacy concerns, no audio files or transcripts are provided in this repository. The pipeline is designed to work with your own audio data following the same structure as described in the paper.

Requirements

Install dependencies with:

pip install -r requirements.txt

Usage

The main analysis is conducted through the analysis.ipynb notebook, which processes audio data, runs ASR models, and generates performance comparisons and visualizations.

Paper Reference

This paper is currently a preprint at arXiv:2508.08684, but is accepted for publication at the HCINLP workshop @ EMNLP 2025.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published