🔥 AudioBench 🔥

⚡ A repository for evaluating AudioLLMs in various tasks 🚀 ⚡
⚡ AudioBench: A Universal Benchmark for Audio Large Language Models 🚀 ⚡

Change log

July 2024: We are working hard on the leaderboard and speech translation dataset. Stay tuned!
July 2024: Support all 26 datasets listed in AudioBench manuscript.

🔧 Installation

Installation with pip:

pip install -r requirements.txt

For model-as-judge evaluation, we serve the judgement model as a service via vllm on port 5000.

⏩ Quick Start

The example is hosting a Llama-3-70B-Instruct model and running the cascade Whisper + Llama-3 model.

# Step 1:
# Server the model as judge
# It will auto-download the model and may requires verification from Hugging Face.
# In the demo, we use 2 H100 80G in order to host the model.
# For smaller VRAM, you may need to reduce the model size.
bash host_model_judge_llama_3_70b_instruct.sh

# Step 2:
# The example is done with 3 H100 80G GPUs.
# The AudioLLMs model inference is done on GPU 2 since GPU 0&1 is used to host model-as-judge services.
# This is a setting for just using 50 samples for evaluation.
MODEL_NAME=whisper_large_v3_with_llama_3_8b_instruct
GPU=2
BATCH_SIZE=1
METRICS=llama3_70b_judge
OVERWRITE=True
NUMBER_OF_SAMPLES=50

DATASET=cn_college_listen_test

bash eval.sh $DATASET $MODEL_NAME $GPU $BATCH_SIZE $OVERWRITE $METRICS $NUMBER_OF_SAMPLES

# Step 3:
# The results would be like:
#    "llama3_70b_judge": {
#        "judge_score": 3.12,
#        "success_rate": 1.0
#    }

The example is how to get started. To evaluate on the full datasets, please refer to Examples.

# After model weight download, run the evaluation script for all datasets
bash examples/eval_SALMONN_7B.sh

📚 Supported Models and Datasets

Datasets

SU=Speech Understanding
  ASR=Automatic Speech Recognition
  SQA=Speech Question Answering
  SI=Speech Instruction

ASU=Audio Scene Understanding
  AC=Audio Captioning
  ASQA=Audio Scene Question Answering

VU=Voice Understanding
  AR=Accent Recognition
  GR=Gender Recognition
  ER=Emotion Recognition

Dataset	Category	Task	Metrics	Status
LibriSpeech-Clean	SU	ASR	WER	✅
LibriSpeech-Other	SU	ASR	WER	✅
CommonVoice-15-EN	SU	ASR	WER	✅
Peoples-Speech	SU	ASR	WER	✅
GigaSpeech	SU	ASR	WER	✅
Earning21	SU	ASR	WER	✅
Earning22	SU	ASR	WER	✅
Tedlium3	SU	ASR	WER	✅
Tedlium3-Longform	SU	ASR	WER	✅
CN-College-Listen	SU	SQA	Model-as-Judge	✅
SLUE-P2-SQA5	SU	SQA	Model-as-Judge	✅
Public-SG-SpeechQA	SU	SQA	Model-as-Judge	✅
DREAM-TTS	SU	SQA	Model-as-Judge	✅
OpenHermes-Audio	SU	SI	Model-as-Judge	✅
ALPACA-Audio	SU	SI	Model-as-Judge	✅
AudioCaps	ASU	AC	Model-as-Judge / METEOR	✅
WavCaps	ASU	AC	Model-as-Judge / METEOR	✅
Clotho-AQA	ASU	ASQA	Model-as-Judge	✅
AudioCaps-QA	ASU	ASQA	Model-as-Judge	✅
WavCaps-QA	ASU	ASQA	Model-as-Judge	✅
VoxCeleb-Accent	VU	AR	Model-as-Judge	✅
VoxCeleb-Gender	VU	GR	Model-as-Judge	✅
IEMOCAP-Gender	VU	GR	Model-as-Judge	✅
IEMOCAP-Emotion	VU	ER	Model-as-Judge	✅
MELD-Sentiment	VU	ER	Model-as-Judge	✅
MELD-Emotion	VU	ER	Model-as-Judge	✅

Models

Model	Size	Notes	Status
Whisper-Large + Llama-3-8B-Instruct	~8B	Cascade Models	✅
SALMONN-7B	~7B	AudioLLM - Fusion Model	✅
Qwen-Audio	~8B	AudioLLM - Fusion Model	TODO
Qwen2-Audio	~8B	AudioLLM - Fusion Model	TODO

📖 Citation

If you find our work useful, please consider citing our paper!

@article{wang2024audiobench,
  title={AudioBench: A Universal Benchmark for Audio Large Language Models},
  author={Wang, Bin and Zou, Xunlong and Lin, Geyu and Sun, Shuo and Liu, Zhuohan and Zhang, Wenyu and Liu, Zhengyuan and Aw, AiTi and Chen, Nancy F},
  journal={arXiv preprint arXiv:2406.16020},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
examples		examples
log/salmonn_7b		log/salmonn_7b
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.sh		eval.sh
host_model_judge_llama_3_70b_instruct.sh		host_model_judge_llama_3_70b_instruct.sh
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 AudioBench 🔥

Change log

🔧 Installation

⏩ Quick Start

📚 Supported Models and Datasets

Datasets

Models

📖 Citation

About

Releases

Packages

Languages

License

AudioLLMs/AudioBench

Folders and files

Latest commit

History

Repository files navigation

🔥 AudioBench 🔥

Change log

🔧 Installation

⏩ Quick Start

📚 Supported Models and Datasets

Datasets

Models

📖 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages