DialogGraph-LLM: Multi-Relational Graph-Informed LLM for End-to-End Audio Dialogue Intent Recognition
📄 Paper • 🚀 Quick Start • 📊 Results •
DialogGraph-LLM: A novel multimodal framework integrating Multi-Relational Dialogue Attention Network (MR-DAN) with Large Language Models for audio dialogue intent recognition.
- 2025.07.10 - 🎉 Paper accepted to ECAI 2025 (oral)!
- 2025.06.24 - 📝 Paper review scores: 7 (good, probably should be accepted), 8 (very good top 50% of all accepted papers at top AI conferences over the years), 6 (borderline (but tending towards acceptance), 7
- 2025.04 - 🚀 Code is publicly available
- 🎯 State-of-the-Art Performance: Achieves 77.31% accuracy on a private MarketCalls dataset and 70.91% accuracy on the public MIntRec2.0 benchmark (IS+OS).
- 🕸️ Multi-Relational Graph Modeling(MR-DAN): Modeling dialogues as heterogeneous graphs. It captures complex inter-utterance dependencies using four distinct edge types: temporal, speaker, cross-turn semantic similarity, and self-loops. MR-DAN employs a specialized multi-head attention mechanism where distinct sets of attention heads are dedicated to processing each specific edge type, enabling nuanced aggregation of contextual information.
- 🔄 Adaptive Semi-Supervised Learning: Implements an innovative SSL strategy leveraging LLM-generated candidate predictions. This includes an Adaptive Threshold Mechanism (ATM) for dynamic, class-aware thresholding, a Δ-Margin strategy for robust high-confidence pseudo-label selection, and Class-Balanced Top-K sampling to effectively address class imbalance and augment training data.
- 🤖 LLM-Powered Reasoning: Built upon the Qwen2.5-Omni-7B multimodal foundation model, integrating graph-derived structural semantics and direct audio features via prompt engineering for sophisticated intent recognition.
- ⚡ Efficient Architecture: Parameter-efficient fine-tuning with LoRA for practical deployment
Figure 1: The DialogGraph-LLM framework integrating multimodal processing, graph-structured dialogue modeling via MR-DAN, and adaptive semi-supervised learning for robust intent recognition.
Figure 2: MR-DAN explicitly models multiple relationship types in dialogues through specialized attention mechanisms, enabling comprehensive structural understanding.
# Clone the repository
git clone git@github.com:david188888/DialogGraph-LLM.git
cd AudioLLM-Telemarketing
# Create and activate a virtual environment
conda create -n <env_name>
conda activate <env_name>
# Install dependencies
pip install -r requirements.txt
Before training, you need to prepare the datasets:
Download the MIntRec2.0 dataset from the official repository:
# Clone the MIntRec2.0 repository
git clone https://github.com/thuiar/MIntRec2.0.git
# Create data directory and copy dataset
mkdir -p data
cp -r MIntRec2.0/data/* data/Visit the MIntRec2.0 repository for detailed dataset information and preparation instructions.
The MarketCalls dataset is currently not publicly available as it requires privacy processing for user voice data. We are working on anonymizing the audio data while preserving the conversational patterns essential for intent recognition research.
# Train with MarketCalls dataset
python train.py \
--config configs/marketcalls_ssl.yaml \
--model_name_or_path Qwen/Qwen2.5-Omni-7B \
--output_dir outputs/dialoggraph-marketcalls \
--use_ssl \
--ssl_start_epoch 10 \
--lambda_ema 0.95 \
--margin_tolerance 0.06Our DialogGraph-LLM achieves significant improvements over strong LLM baselines on the MarketCalls dataset:
| Model | Overall Acc (%) | Overall F1 (%) | Class A F1 (%) | Class B F1 (%) | Class C F1 (%) | Class D F1 (%) |
|---|---|---|---|---|---|---|
| Llama3.1-8B | 49.85 | 49.20 | 22.70 | 56.10 | 58.50 | 19.30 |
| GLM-4-9B | 51.75 | 51.15 | 23.60 | 58.00 | 60.30 | 20.20 |
| Gemini1.5-Pro | 53.60 | 53.00 | 24.50 | 60.00 | 62.20 | 21.20 |
| Qwen2.5-Omni | 63.58 | 63.10 | 28.50 | 72.50 | 74.30 | 24.80 |
| DialogGraph-LLM | 77.31 | 76.83 | 44.53 | 83.54 | 85.21 | 41.75 |
Key Improvements:
- +13.73% accuracy improvement over Qwen2.5-Omni baseline
- +20% F1-score improvement in minority classes (A & D)
- Consistent gains across all intent categories
Comparison with state-of-the-art multimodal intent recognition methods:
| Method | IS Acc (%) | IS F1 (%) | IS Precision (%) | IS Recall (%) | IS+OS Acc (%) | IS+OS F1 (%) |
|---|---|---|---|---|---|---|
| MulT (ACL 2019) | 60.66 | 54.12 | 58.02 | 53.77 | 56.00 | 47.35 |
| MAG-BERT (ACL 2020) | 60.58 | 55.17 | 57.78 | 55.10 | 56.20 | 48.00 |
| TCL-MAP (AAAI 2024) | 61.97 | 56.09 | 58.14 | 53.42 | - | - |
| A-MESS | 62.39 | 55.91 | 60.10 | 55.93 | 56.81 | 49.31 |
| DialogGraph-LLM | 70.91 | 66.54 | 69.12 | 64.15 | 64.28 | 58.14 |
Achievements:
- +8.52% accuracy improvement over previous SOTA
- +10.63% F1-score improvement
- Strong performance on both in-scope and out-of-scope detection
We welcome contributions from the community! Here's how you can help improve DialogGraph-LLM
We gratefully acknowledge the following contributions and support:
-
Qwen2.5-Omni: This project builds upon the outstanding Qwen2.5-Omni-7B multimodal foundation model developed by the Qwen Team at Alibaba Cloud. Qwen2.5-Omni is an end-to-end multimodal model capable of perceiving diverse modalities including text, images, audio, and video while generating natural speech responses. The model is released under the Apache-2.0 License, and we extend our sincere gratitude to the Qwen team for making this powerful model available to the research community.
-
MIntRec2.0: We acknowledge the use of the MIntRec2.0 dataset, a large-scale benchmark for multimodal intent recognition and out-of-scope detection in multi-party conversations. This comprehensive dataset contains 15,040 utterances across 30 intent classes with text, video, and audio modalities, providing an essential resource for evaluating multimodal intent recognition approaches. The dataset is released under the CC-BY-NC-SA-4.0 License, and we are grateful to the authors for their valuable contribution to the multimodal intent recognition research community.
🌟 Star us on GitHub if you find this project helpful! 🌟 For questions, issues, or collaborations, please: 📧 Email: hongyuliu@m.scnu.edu.cn or junxinli@m.scnu.edu.cn