Skip to content

Latest commit

 

History

History
69 lines (54 loc) · 4.16 KB

TRAIN.md

File metadata and controls

69 lines (54 loc) · 4.16 KB

Data preparation

We also provide the processed data as follows. The link is to BaiDu Disk.

Data groupUsageLink
LLaVA-PTStage 1LLaVA 1.5-558k
Hybird-FTStage 2SViT-157k, LVIS-220k, LRV-331k, MIMIC-IT-256k
LLaVA-FTStage 3LLaVA 1.5-mix-665k

For those who can not easily access to BaiDu Disk, you can download data from Hugging Face.

After downloading all of them, organize the data as follows in IMAGE_FOLDER.

IMAGE_FOLDER
├── llava_image
├── llava_image_tune
├── lvis_tune
├── lrv_tune
├── svit_tune
└── mimicit_tune
    └── LA

Training

Specify your IMAGE_FOLDER and JSON_FOLDER according to the data preparation.

For training on 384 resolution, we use google/siglip-so400m-patch14-384 as image_tower. Notably, if you pass the --image_tower google/siglip-so400m-patch14-384, you should upgrade the version of transformers to 4.37.0.

Qwen

Phi2

StableLM

OpenChat