GitHub

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

A competition-based benchmark with quantitative metrics for Large Language Model Powered Multi-agent system.
🐛 Report Bug · 📃 Main Page · 📖 Paper 📊 Leaderboard

📌 MAgIC Benchmark News 🎉🔥

📖 About The Project

Scenarios

MAgIC provides a benchmark that can quantitatively measure the abilities of Cognition, Adaptability, Rationality and Collaboration of Large Language Models within multi-agent sytems. Our benchmark are based competition on 5 scenarios:

Chameleon
Undercover
Cost Sharing
Prisoner' Dilemma
Public Good

PGM-Aware Agent Structure

Evaluation Metrics and Game Win Rate

(back to top)

Leaderboard

We have tested 10 models in our benchmark, and the PGM method we proposed has achieved a remarkable improvement.

Getting Started

Installation

Environment preparation

# conda virtual environment
conda create -n magic_llm python=3.9
conda activate magic_llm
 
# or python3 virtual environment
mkdir magic_llm
python3 -m venv magic_llm
source magic_llm/bin/activate

Install required environments

pip3 install -r requirements.txt

Run competition and evaluation

Get your own OpenAI API Key, and set $openai_api_key$

export OPENAI_API_KEY=$openai_api_key$

Run experiments and calculate metrics. Now this code verson only support openai models, if you want to test your own LLMs, please refer to our leaderboard website to test your LLM and upload your results.

python3 arena_runner.py

(back to top)

Roadmap

Upload relevant code
Add link to Leaderboard website
Introduce more scenarios and LLM results
Add Online Demo where human and various LLMs can play together

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Lin Xu- @Lin_Xu_ - cathyxl2016@gmail.com

(back to top)

Citation

@article{xu2023magic,
      title={MAgIC: Benchmarking Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration}, 
      author={Lin Xu and Zhiyuan Hu and Daquan Zhou and Hongyu Ren and Zhen Dong and Kurt Keutzer and See Kiong Ng and Jiashi Feng},
      year={2023},
      journal={arXiv preprint arXiv: 2311.08562}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
chatarena		chatarena
config_release		config_release
imgs		imgs
prompts		prompts
topics_release		topics_release
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arena_runner.py		arena_runner.py
gpt-3.5-turbo_metrics.json		gpt-3.5-turbo_metrics.json
metrics_release.py		metrics_release.py
requirement.txt		requirement.txt
run_competition_airportfee.py		run_competition_airportfee.py
run_competition_chameleon.py		run_competition_chameleon.py
run_competition_prisoner.py		run_competition_prisoner.py
run_competition_public_good.py		run_competition_public_good.py
run_competition_undercover.py		run_competition_undercover.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

📌 MAgIC Benchmark News 🎉🔥

📖 About The Project

Scenarios

PGM-Aware Agent Structure

Evaluation Metrics and Game Win Rate

Leaderboard

Getting Started

Installation

Run competition and evaluation

Roadmap

License

Contact

Citation

About

Releases

Packages

Languages

License

cathyxl/MAgIC

Folders and files

Latest commit

History

Repository files navigation

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

📌 MAgIC Benchmark News 🎉🔥

📖 About The Project

Scenarios

PGM-Aware Agent Structure

Evaluation Metrics and Game Win Rate

Leaderboard

Getting Started

Installation

Run competition and evaluation

Roadmap

License

Contact

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages