A specialized implementation of Mini-GPT4 demo within command lines instead of Gradio online.
Installation of MiniGPT-4
This section illustrates my personal experience in playing with the fascinating MiniGPT-4 (7b), big thanks to the authors.
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
apt-get install git-lfs
git lfs install
git clone https://huggingface.co/lmsys/vicuna-7b-delta-v1.1
git clone https://huggingface.co/decapoda-research/llama-7b-hf
(Tips: A formal use should apply the checkpoints and tokenizer through Llama)
Use the official FastChat tool to generate the final weights under directory FastChat
git clone https://github.com/lm-sys/FastChat.git
pip3 install --upgrade pip
pip3 install -e .
dont't miss the '.'
python -m fastchat.model.apply_delta --base {llama-13bOR7b-hf/} --target {weights/} --delta {vicuna-13bOR7b-delta-v1.1/}
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
conda env create -f environment.yml
Download the checkpoints(e.g., 7b)
Claim your image file directory path and run:
conda activate minigpt4
pip install -U bitsandbytes
to upgrade tp 0.38.1
python demo_localized.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0 --img-dir {image directory}
It automatically reads all images in img-dir
and input to Mini-GPT4 in turn.
In demo_localized.py
, the question is originally fixed for labeling on all images. Feel free to move user_message
inside the for loop.
Thanks to the whole community including authors of Llama, Vicuna, MiniGPT-4, and open-source contributors that helped me during the installation.
If you find this implementation helpful, please star and raise issues for further improvements together.
@article{dong2024modality, title={Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering}, author={Dong, Junnan and Zhang, Qinggang and Zhou, Huachi and Zha, Daochen and Zheng, Pai and Huang, Xiao}, journal={arXiv preprint arXiv:2402.12728}, year={2024} }