SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

Introduction

SpeechAgents is a multi-modal LLM based multi-agent system designed for human communication simulating. Different from current LLM-based multi-agent systems, SpeechAgents utilizes multi-modal LLM as the central control for individual agent and employ multi-modal signals as the medium for exchanged messages among agents. Additionally, we propose Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without compromising general abilities. To strengthen and evaluate the effectiveness of human communication simulation, we build the Human-Communication Simulation Benchmark.
SpeechAgents demos are shown in our project page. As shown in the demos, SpeechAgents can generate human-like communication dialogues with consistent content, authentic rhythm, and rich emotions, which can accomplish tasks such as drama creation and audio novels generation.

llustration of training and inference process of an individual agent in SpeechAgents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

Introduction

Code

Demo

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems

Introduction

Code

Demo

Citation