GitHub - SCAI-JHU/MuMA-ToM: MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

This repo features the code for the paper MuMA-ToM: Multi-modal Multi-Agent Theory of Mind.

It contains:

Instructions for utilizing the MuMA-ToM Benchmark
Implementation and guidelines for utilizing the LIMP model
Code for procedural generation of data

Language model-based Inverse Multi-agent Planning (LIMP)

We propose Language model-based Inverse Multi-agent Planning (LIMP), a novel method to solve multimodal and multiagent theory of mind reasoning.

To run the LIMP on MuMA-ToM benchmark, please fill in your GPT api key in the files. We use GPT-4o for all our tasks.

We use web version of Gemini 1.5 Pro in Google AI studio for visual extraction, as it's more powerful than API version.

For visual action extraction, please upload each video to the Google AI studio. "actions_extracted.json" under "Files" folder contains the prompt we use for each episode (by id). Upload corresponding video to the Google AI studio and put the outputs under "actions" of each entry in the json file.

Afterward, directly run LIMP.py.

MuMA-ToM benchmark

MuMA-ToM benchmark is stored on the huggingface. Here is the link

In the dataset, "questions.json" and "texts.json" contains question text and multimodal textual input text for our benchmark. "Videos" folder contains all the RGB videos for our benchmark. "full episode descriptions" folder contains GPT generated description of our interactive scenarios with ground-truth actions and utterances.

We also generate a training set containing a thousand videos for multi-agent interactions in the household environments. The training set is stored in the "training_set" folder, with agents' actions as annotation.

If you need instance segmentation and depth images to conduct further experiments, please contact us. The visual analysis result for generating scene graphs using instance segmentation is stored in the "visual data" folder.

Citation

Please cite the paper and star this repo if you find it interesting/useful, thanks!

@article{shi2024muma,
  title={MuMA-ToM: Multi-modal Multi-Agent Theory of Mind},
  author={Shi, Haojun and Ye, Suyu and Fang, Xinyu and Jin, Chuanyang and Isik, Leyla and Kuo, Yen-Ling and Shu, Tianmin},
  journal={arXiv preprint arXiv:2408.12574},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Files		Files
Instance_segmentation		Instance_segmentation
LIMP		LIMP
Procedural_generation		Procedural_generation
figures		figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

Language model-based Inverse Multi-agent Planning (LIMP)

MuMA-ToM benchmark

Citation

About

Releases

Packages

Contributors 4

Languages

SCAI-JHU/MuMA-ToM

Folders and files

Latest commit

History

Repository files navigation

MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

Language model-based Inverse Multi-agent Planning (LIMP)

MuMA-ToM benchmark

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages