Name		Name	Last commit message	Last commit date
parent directory ..
finetune		finetune
pretrain		pretrain
README.md		README.md
convert_phi_to_llama.py		convert_phi_to_llama.py
convert_xtuner_weights_to_hf.py		convert_xtuner_weights_to_hf.py
convert_xtuner_weights_to_llava.py		convert_xtuner_weights_to_llava.py

README.md

LLaVA-Phi-3-mini

Results

Model	MMBench Test (EN)	MMMU Val	SEED-IMG	AI2D Test	ScienceQA Test	HallusionBench aAcc	POPE	GQA	TextVQA	MME	MMStar	Configs
LLaVA-v1.5-7B	66.5	35.3	60.5	54.8	70.4	44.9	85.9	62.0	58.2	1511/348	30.3	-
LLaVA-Llama-3-8B	68.9	36.8	69.8	60.9	73.3	47.3	87.2	63.5	58.0	1506/295	38.2	Pretrain / Fine-tune
LLaVA-Llama-3-8B-v1.1	72.3	37.1	70.1	70.0	72.9	47.7	86.4	62.6	59.0	1469/349	45.1	Pretrain / Fine-tune
LLaVA-Phi-3-mini	69.2	41.4	70.0	69.3	73.7	49.8	87.3	61.5	57.8	1477/313	43.7	Pretrain / Fine-tune

Resources

Official LLaVA format model (xtuner/llava-phi-3-mini): 🤗 HuggingFace / 🤖 ModelScope
HuggingFace LLaVA format model (xtuner/llava-phi-3-mini-hf): 🤗 HuggingFace / 🤖 ModelScope
XTuner LLaVA format model (xtuner/llava-phi-3-mini-xtuner): 🤗 HuggingFace / 🤖 ModelScope
GGUF model (xtuner/llava-phi-3-mini-gguf): 🤗 HuggingFace / 🤖 ModelScope
Pretrained projector weights: 🤗 HuggingFace / 🤖 ModelScope

Data Preparation

Please refer to here.

Training

LLaVA-Phi-3-mini

Pretrain

NPROC_PER_NODE=8 xtuner train llava_phi3_mini_4k_instruct_clip_vit_large_p14_336_e1_gpu8_sharegpt4v_pretrain --deepspeed deepspeed_zero2 --seed 1024

Fine-tune

NPROC_PER_NODE=8 xtuner train llava_phi3_mini_4k_instruct_full_clip_vit_large_p14_336_full_e2_gpu8_internvl_finetune --deepspeed deepspeed_zero2 --seed 1024

Model Conversion

Step 0. Convert `.pth` file to LLaVA model in xtuner format (LLaVA-Phi-3-mini-xtuner)

After training, we will obtain a set of weights (i.e., iter_xxx.pth), which are not in the universal HuggingFace format. We first need to convert them to the LLaVA model in xtuner format.

xtuner convert pth_to_hf $FINETUNE_CFG $PTH_PATH $SAVE_PATH
# e.g., xtuner convert pth_to_hf llava_phi3_mini_4k_instruct_full_clip_vit_large_p14_336_full_e2_gpu8_internvl_finetune ./iter_39620.pth ./iter_39620_xtuner

./iter_39620_xtuner
├── added_tokens.json
├── config.json
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
├── model.safetensors.index.json
├── projector
│   ├── config.json
│   ├── configuration_projector.py
│   ├── modeling_projector.py
│   └── model.safetensors
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
├── tokenizer.model
└── visual_encoder
    ├── config.json
    ├── model.safetensors
    └── preprocessor_config.json

At this time, the LLaVA model of xtuner-format can engage in conversation using xtuner chat, by

xtuner chat ./iter_39620_xtuner \
  --llava ./iter_39620_xtuner \
  --prompt-template phi3_chat \
  --image $IMAGE_PATH

and in MMBench evaluation, by

xtuner mmbench ./iter_39620_xtuner \
  --llava ./iter_39620_xtuner \
  --prompt-template phi3_chat \
  --data-path $DATA_PATH \
  --work-dir $RESULT_PATH

Here, $DATA_PATH refers to one of the mmbench datasets. You can download the expected data by

wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_EN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_EN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_CN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_CN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/CCBench.tsv

Step 1. Convert LLaVA in xtuner format to official LLaVA format or HuggingFace LLaVA format

The official LLaVA format is structured similarly to the architecture of the liuhaotian/llava-v1.5-7b model.
The HuggingFace LLaVA format is structured similarly to the architecture of the llava-hf/llava-1.5-7b-hf model.

Since the official LLaVA format and the HuggingFace LLaVA format only support Llama architecture as the LLM, we need to first convert the phi-3 model to an equivalent Llama LLM.

python ./convert_phi_to_llama.py --phi_path ./iter_39620_xtuner --save_path ./iter_39620_xtuner_llama_llm

Here, --phi_path should specify the path to phi-3, which is the path obtained from Step.0 for the xtuner-format LLaVA model. --save_path should specify the save path for the converted Llama LLM.

To official LLaVA format (LLaVA-Phi-3-mini)

We can utilize the following command to obtain the LLaVA model in the official LLaVA format.

python ./convert_xtuner_weights_to_llava.py --text_model_id ./iter_39620_xtuner_llama_llm --vision_model_id ./iter_39620_xtuner/visual_encoder --projector_weight ./iter_39620_xtuner/projector/model.safetensors --save_path ./iter_39620_llava

Here, the converted LLaVA model in official LLaVA format is saved to ./iter_39620_llava.

./iter_39620_llava
├── added_tokens.json
├── config.json
├── generation_config.json
├── model-00001-of-00005.safetensors
├── model-00002-of-00005.safetensors
├── model-00003-of-00005.safetensors
├── model-00004-of-00005.safetensors
├── model-00005-of-00005.safetensors
├── model.safetensors.index.json
├── preprocessor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model

To HuggingFace LLaVA format (LLaVA-Phi-3-mini-hf)

We can utilize the following command to obtain the LLaVA model in the HuggingFace LLaVA format.

python ./convert_xtuner_weights_to_hf.py --text_model_id ./iter_39620_xtuner_llama_llm --vision_model_id ./iter_39620_xtuner/visual_encoder --projector_weight ./iter_39620_xtuner/projector/model.safetensors --save_path ./iter_39620_hf

Here, the converted LLaVA model in HuggingFace LLaVA format is saved to ./iter_39620_hf.

./iter_39620_hf
├── added_tokens.json
├── config.json
├── generation_config.json
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
├── model.safetensors.index.json
├── preprocessor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model

Chat

XTuner LLaVA format docs
Official LLaVA format docs
HuggingFace LLaVA format docs
GGUF format docs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phi3_mini_4k_instruct_clip_vit_large_p14_336

phi3_mini_4k_instruct_clip_vit_large_p14_336

finetune

finetune

pretrain

pretrain

README.md

README.md

convert_phi_to_llama.py

convert_phi_to_llama.py

convert_xtuner_weights_to_hf.py

convert_xtuner_weights_to_hf.py

convert_xtuner_weights_to_llava.py

convert_xtuner_weights_to_llava.py

README.md

LLaVA-Phi-3-mini

Results

Resources

Data Preparation

Training

LLaVA-Phi-3-mini

Model Conversion

Step 0. Convert `.pth` file to LLaVA model in xtuner format (LLaVA-Phi-3-mini-xtuner)

Step 1. Convert LLaVA in xtuner format to official LLaVA format or HuggingFace LLaVA format

To official LLaVA format (LLaVA-Phi-3-mini)

To HuggingFace LLaVA format (LLaVA-Phi-3-mini-hf)

Chat

Files

phi3_mini_4k_instruct_clip_vit_large_p14_336

Directory actions

More options

Directory actions

More options

Latest commit

History

phi3_mini_4k_instruct_clip_vit_large_p14_336

Folders and files

parent directory

LLaVA-Phi-3-mini

Results

Resources

Data Preparation

Training

LLaVA-Phi-3-mini

Model Conversion

Step 0. Convert .pth file to LLaVA model in xtuner format (LLaVA-Phi-3-mini-xtuner)

Step 1. Convert LLaVA in xtuner format to official LLaVA format or HuggingFace LLaVA format

To official LLaVA format (LLaVA-Phi-3-mini)

To HuggingFace LLaVA format (LLaVA-Phi-3-mini-hf)

Chat

Step 0. Convert `.pth` file to LLaVA model in xtuner format (LLaVA-Phi-3-mini-xtuner)