LAMM

LAMM (pronounced as /læm/, means cute lamb to show appreciation to LLaMA), is a growing open-source community aimed at helping researchers and developers quickly train and evaluate Multi-modal Large Language Models (MLLM), and further build multi-modal AI agents capable of bridging the gap between ideas and execution, enabling seamless interaction between humans and AI machines.

🌏 Project Page

Updates

📆 [2024-03]

Ch3Ef is available!
Ch3Ef released on Arxiv!
Dataset and leaderboard are available!

📆 [2023-12]

DepictQA: Depicted Image Quality Assessment based on Multi-modal Language Models released on Arxiv!
MP5: A Multi-modal LLM based Open-ended Embodied System in Minecraft released on Arxiv!

📆 [2023-11]

ChEF: A comprehensive evaluation framework for MLLM released on Arxiv!
Octavius: Mitigating Task Interference in MLLMs by combining Mixture-of-Experts (MoEs) with LoRAs released on Arxiv!
Camera ready version of LAMM is available on Arxiv.

📆 [2023-10]

LAMM is accepted by NeurIPS2023 Datasets & Benchmark Track! See you in December!

📆 [2023-09]

Light training framework for V100 or RTX3090 is available! LLaMA2-based finetuning is also online.
Our demo moved to OpenXLab.

📆 [2023-07]

Checkpoints & Leaderboard of LAMM on huggingface updated on new code base.
Evaluation code for both 2D and 3D tasks are ready.
Command line demo tools updated.

📆 [2023-06]

LAMM: 2D & 3D dataset & benchmark for MLLM
Watch demo video for LAMM at YouTube or Bilibili!
Full paper with Appendix is available on Arxiv.
LAMM dataset released on Huggingface & OpenDataLab for Research community!',
LAMM code is available for Research community!

Paper List

Publications

Preprints

Citation

LAMM

@article{yin2023lamm,
    title={LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark},
    author={Yin, Zhenfei and Wang, Jiong and Cao, Jianjian and Shi, Zhelun and Liu, Dingning and Li, Mukai and Sheng, Lu and Bai, Lei and Huang, Xiaoshui and Wang, Zhiyong and others},
    journal={arXiv preprint arXiv:2306.06687},
    year={2023}
}

Assessment of Multimodal Large Language Models in Alignment with Human Values

@misc{shi2024assessment,
      title={Assessment of Multimodal Large Language Models in Alignment with Human Values}, 
      author={Zhelun Shi and Zhipin Wang and Hongxing Fan and Zaibin Zhang and Lijun Li and Yongting Zhang and Zhenfei Yin and Lu Sheng and Yu Qiao and Jing Shao},
      year={2024},
      eprint={2403.17830},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

ChEF

@misc{shi2023chef,
      title={ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models}, 
      author={Zhelun Shi and Zhipin Wang and Hongxing Fan and Zhenfei Yin and Lu Sheng and Yu Qiao and Jing Shao},
      year={2023},
      eprint={2311.02692},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Octavius

@misc{chen2023octavius,
      title={Octavius: Mitigating Task Interference in MLLMs via MoE}, 
      author={Zeren Chen and Ziqin Wang and Zhen Wang and Huayang Liu and Zhenfei Yin and Si Liu and Lu Sheng and Wanli Ouyang and Yu Qiao and Jing Shao},
      year={2023},
      eprint={2311.02684},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

DepictQA

@article{depictqa,
        title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
        author={You, Zhiyuan and Li, Zheyuan, and Gu, Jinjin, and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
        journal={arXiv preprint arXiv:2312.08962},
        year={2023}
    }

MP5

@misc{qin2023mp5,
  title         = {MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception}, 
  author        = {Yiran Qin and Enshen Zhou and Qichang Liu and Zhenfei Yin and Lu Sheng and Ruimao Zhang and Yu Qiao and Jing Shao},
  year          = {2023},
  eprint        = {2312.07472},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

Get Started

Please see tutorial for the basic usage of this repo.

License

The project is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
ckpt		ckpt
data		data
docs		docs
images		images
requirements		requirements
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ckpt

ckpt

data

data

docs

docs

images

images

requirements

requirements

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

LAMM

Updates

Paper List

Citation

Get Started

License

About

Releases 1

Packages

Contributors 11

Languages

OpenGVLab/LAMM

Folders and files

Latest commit

History

Repository files navigation

LAMM

Updates

Paper List

Citation

Get Started

License

About

Resources

Stars

Watchers

Forks

Languages