[feat] Add finetune code for Yi-VL model #368

minlik · 2024-01-30T09:49:19Z

The code is mostly modified from LLaVA

Prerequistes -> Prerequisites

* [doc][feat] modified readme_CN. * [doc][feat] modified readme_CN. --------- Co-authored-by: YShow <66633207+Yimi81@users.noreply.github.com>

Yimi81 · 2024-02-01T06:57:56Z

@minlik Thank you for your PR, I will test it

* [doc][feat] modified readme. * [doc][feat] modified readme. * [doc][feat] modified readme. * [doc][feat] modified readme. * [doc][feat] modified readme. * [doc][feat] modified readme.

Yimi81 · 2024-02-01T14:18:21Z

Can you provide your environment? Both the official configuration of llava and the requirements. txt you provided reported errors. It would be great if you could provide a step in readme to reproduce your environment
The various errors I encountered:

- Import error
    from llava.train.llama_flash_attn_monkey_patch import replace_llama_attn_with_flash_attn
- ModuleNotFoundError: No module named 'llava'
- ImportError: cannot import name 'ShardedDDPOption' from 'transformers.trainer'
- ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

minlik · 2024-02-01T15:19:57Z

Can you provide your environment? Both the official configuration of llava and the requirements. txt you provided reported errors. It would be great if you could provide a step in readme to reproduce your environment The various errors I encountered:
- Import error
    from llava.train.llama_flash_attn_monkey_patch import replace_llama_attn_with_flash_attn
- ModuleNotFoundError: No module named 'llava'
- ImportError: cannot import name 'ShardedDDPOption' from 'transformers.trainer'
- ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

I have updated the requirements in the latest commit here. I suggest you reinstall the requirements.txt under VL folder, and these errors should be resolved.

- Import error
    from llava.train.llama_flash_attn_monkey_patch import replace_llama_attn_with_flash_attn
- ModuleNotFoundError: No module named 'llava'

I have added PYTHONPATH=../../:$PYTHONPATH in the training scripts to ensure python environment

- ImportError: cannot import name 'ShardedDDPOption' from 'transformers.trainer'

I use transformers==4.34.0 because transformers removed ShardedDDPOption since 4.35. I will fix it later, but for now, 4.34.0 is okay

- ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

I think the error is related to the torch version and cuda version. You may need to reinstall some packages following here.

Some of my environment is listed here, with cuda version 11.7

accelerate==0.26.1
bitsandbytes==0.42.0
deepspeed==0.13.1
flash-attn==2.3.3
huggingface-hub==0.17.3
peft==0.8.1
tokenizers==0.14.1
torch==2.0.1
torchvision==0.15.2
transformers==4.34.0
wandb==0.16.2

Yimi81 · 2024-02-02T10:18:07Z

my sh script:

PYTHONPATH=../../:$PYTHONPATH \
deepspeed --include localhost:0,1 --master_port 1234 llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --lora_enable True \
    --model_name_or_path /ML-A100/public/tmp/pretrain_weights/Yi-VL-6B \
    --data_path /ML-A100/public/tmp/yiguofeng/contribute/Yi/data.json \
    --image_folder /ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/images \
    --vision_tower /ML-A100/public/tmp/pretrain_weights/Yi-VL-34B/vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-34B-448 \
    --output_dir /ML-A100/public/tmp/VL-FT\
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --num_train_epochs 10 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 3 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --dataloader_num_workers 4 \
    --report_to wandb

error:

Traceback (most recent call last):
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/train/train_mem.py", line 13, in <module>
    train()
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/train/train.py", line 786, in train
    model = LlavaLlamaForCausalLM.from_pretrained(
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3085, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_llama.py", line 44, in __init__
    self.model = LlavaLlamaModel(config)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_llama.py", line 36, in __init__
    super(LlavaLlamaModel, self).__init__(config)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_arch.py", line 32, in __init__
    self.vision_tower = build_vision_tower(config, delay_load=True)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/clip_encoder/builder.py", line 11, in build_vision_tower
    return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/clip_encoder/clip_encoder.py", line 19, in __init__
    self.cfg_only = CLIPVisionConfig.from_pretrained(self.vision_tower_name)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/models/clip/configuration_clip.py", line 238, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/configuration_utils.py", line 620, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/configuration_utils.py", line 696, in _get_config_dict
    raise EnvironmentError(
OSError: Can't load the configuration of './vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-6B-448'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure './vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-6B-448'

How did you specify a vision_tower? Are you the same as me?

weijingxuan · 2024-02-02T10:26:13Z

Hello,

I've been trying to run the finetune script as per the instructions in VL/scripts/finetune.sh, but I keep encountering an error that I haven't been able to resolve. The script fails to execute properly, and the error seems to originate from the llama_flash_attn_monkey_patch.py file, specifically lines 87 to 89:
output_unpad = flash_attn_unpadded_qkvpacked_func(
qkv, cu_q_lens, max_s, 0.0, softmax_scale=None, causal=True
)
The error message I receive is as follows: cu_seqlens_q must have shape (batch_size + 1)
I have followed all the setup instructions accurately and ensured that all dependencies are correctly installed.
I would greatly appreciate any guidance on the specific details of using this code or suggestions on how I might resolve this error. Is there a particular setup or dependency version that I should be aware of?

Thank you.

minlik · 2024-02-02T16:51:16Z

my sh script:

PYTHONPATH=../../:$PYTHONPATH \
deepspeed --include localhost:0,1 --master_port 1234 llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --lora_enable True \
    --model_name_or_path /ML-A100/public/tmp/pretrain_weights/Yi-VL-6B \
    --data_path /ML-A100/public/tmp/yiguofeng/contribute/Yi/data.json \
    --image_folder /ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/images \
    --vision_tower /ML-A100/public/tmp/pretrain_weights/Yi-VL-34B/vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-34B-448 \
    --output_dir /ML-A100/public/tmp/VL-FT\
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --num_train_epochs 10 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 3 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --dataloader_num_workers 4 \
    --report_to wandb

error:

Traceback (most recent call last):
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/train/train_mem.py", line 13, in <module>
    train()
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/train/train.py", line 786, in train
    model = LlavaLlamaForCausalLM.from_pretrained(
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3085, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_llama.py", line 44, in __init__
    self.model = LlavaLlamaModel(config)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_llama.py", line 36, in __init__
    super(LlavaLlamaModel, self).__init__(config)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_arch.py", line 32, in __init__
    self.vision_tower = build_vision_tower(config, delay_load=True)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/clip_encoder/builder.py", line 11, in build_vision_tower
    return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/clip_encoder/clip_encoder.py", line 19, in __init__
    self.cfg_only = CLIPVisionConfig.from_pretrained(self.vision_tower_name)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/models/clip/configuration_clip.py", line 238, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/configuration_utils.py", line 620, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/configuration_utils.py", line 696, in _get_config_dict
    raise EnvironmentError(
OSError: Can't load the configuration of './vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-6B-448'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure './vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-6B-448'

How did you specify a vision_tower? Are you the same as me?

my sh script:

PYTHONPATH=../../:$PYTHONPATH \
deepspeed --include localhost:0,1 --master_port 1234 llava/train/train_mem.py \
    --deepspeed ./scripts/zero2.json \
    --lora_enable True \
    --model_name_or_path /ML-A100/public/tmp/pretrain_weights/Yi-VL-6B \
    --data_path /ML-A100/public/tmp/yiguofeng/contribute/Yi/data.json \
    --image_folder /ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/images \
    --vision_tower /ML-A100/public/tmp/pretrain_weights/Yi-VL-34B/vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-34B-448 \
    --output_dir /ML-A100/public/tmp/VL-FT\
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --bf16 True \
    --num_train_epochs 10 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 200 \
    --save_total_limit 3 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --lazy_preprocess True \
    --dataloader_num_workers 4 \
    --report_to wandb

error:

Traceback (most recent call last):
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/train/train_mem.py", line 13, in <module>
    train()
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/train/train.py", line 786, in train
    model = LlavaLlamaForCausalLM.from_pretrained(
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3085, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_llama.py", line 44, in __init__
    self.model = LlavaLlamaModel(config)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_llama.py", line 36, in __init__
    super(LlavaLlamaModel, self).__init__(config)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/llava_arch.py", line 32, in __init__
    self.vision_tower = build_vision_tower(config, delay_load=True)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/clip_encoder/builder.py", line 11, in build_vision_tower
    return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs)
  File "/ML-A100/public/tmp/yiguofeng/contribute/Yi/VL/llava/model/clip_encoder/clip_encoder.py", line 19, in __init__
    self.cfg_only = CLIPVisionConfig.from_pretrained(self.vision_tower_name)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/models/clip/configuration_clip.py", line 238, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/configuration_utils.py", line 620, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/ML-A100/public/tmp/miniconda3/envs/vl_test/lib/python3.10/site-packages/transformers/configuration_utils.py", line 696, in _get_config_dict
    raise EnvironmentError(
OSError: Can't load the configuration of './vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-6B-448'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure './vit/clip-vit-H-14-laion2B-s32B-b79K-yi-vl-6B-448'

How did you specify a vision_tower? Are you the same as me?

I set the parameter mm_vision_tower in config.json to the local ViT path, as suggested by the previous Yi-VL model readme file. However, I just checked the latest readme file and found that the suggestion was removed. Something changed and I will look into this further, but for now, simply set the parameter mm_vision_tower in config.json to the local ViT path, and I think it will work fine.

minlik · 2024-02-02T16:59:01Z

Hello,

I've been trying to run the finetune script as per the instructions in VL/scripts/finetune.sh, but I keep encountering an error that I haven't been able to resolve. The script fails to execute properly, and the error seems to originate from the llama_flash_attn_monkey_patch.py file, specifically lines 87 to 89: output_unpad = flash_attn_unpadded_qkvpacked_func( qkv, cu_q_lens, max_s, 0.0, softmax_scale=None, causal=True ) The error message I receive is as follows: cu_seqlens_q must have shape (batch_size + 1) I have followed all the setup instructions accurately and ensured that all dependencies are correctly installed. I would greatly appreciate any guidance on the specific details of using this code or suggestions on how I might resolve this error. Is there a particular setup or dependency version that I should be aware of?

Thank you.

I think it is related to the flash-attention or transformers version. You can try to use my following environment. Alternatively, you can simply avoid using flash attention by changing train_mem.py to train.py to see whether it is okay. Moreover, I have only tested finetune_qlora.sh and finetune_lora.sh, but not the full finetune script finetune.sh, due to my GPU memory limit.

accelerate==0.26.1
bitsandbytes==0.42.0
deepspeed==0.13.1
flash-attn==2.3.3
huggingface-hub==0.17.3
peft==0.8.1
tokenizers==0.14.1
torch==2.0.1
torchvision==0.15.2
transformers==4.34.0
wandb==0.16.2

qiuhuiGithub · 2024-02-22T06:46:48Z

could you provide a sample of you training data? Is the data format same as llava and just change <image> to <image_placeholder> ?

minlik · 2024-02-22T14:14:19Z

could you provide a sample of you training data? Is the data format same as llava and just change to <image_placeholder> ?

Yes. See here

murray-z · 2024-03-07T10:20:22Z

你好，最新的微调代码哪里可以看到呢？

likuan and others added 8 commits January 29, 2024 18:36

add training scripts

8a28092

update finetune scripts

25334e0

Merge remote-tracking branch 'upstream/main'

dd88aba

remove unused scripts

2b61eae

update merge lora

9343ff8

Update README.md (01-ai#370)

3a34852

Prerequistes -> Prerequisites

[doc][feat] modified readme_CN. (01-ai#365)

574e610

* [doc][feat] modified readme_CN. * [doc][feat] modified readme_CN. --------- Co-authored-by: YShow <66633207+Yimi81@users.noreply.github.com>

update readme for Yi-VL finetuning

36b51f3

This was referenced Jan 31, 2024

Question about whether Yi-VL-6B can be fine-tuned using its own dataset #348

Closed

Features will release training code of VL? #342

Closed

likuan and others added 3 commits February 1, 2024 18:27

update PYTHONPATH for training; update requirements

a972a4a

update readme (01-ai#373)

dc7dcbb

[doc][feat] modified readme (01-ai#372)

d27fa08

* [doc][feat] modified readme. * [doc][feat] modified readme. * [doc][feat] modified readme. * [doc][feat] modified readme. * [doc][feat] modified readme. * [doc][feat] modified readme.

nuoma mentioned this pull request Apr 24, 2024

where can I find the training code or script for YI-VL #505

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add finetune code for Yi-VL model #368

[feat] Add finetune code for Yi-VL model #368

minlik commented Jan 30, 2024

Yimi81 commented Feb 1, 2024

Yimi81 commented Feb 1, 2024

minlik commented Feb 1, 2024

Yimi81 commented Feb 2, 2024

weijingxuan commented Feb 2, 2024

minlik commented Feb 2, 2024

minlik commented Feb 2, 2024

qiuhuiGithub commented Feb 22, 2024 •

edited

Loading

minlik commented Feb 22, 2024

murray-z commented Mar 7, 2024

[feat] Add finetune code for Yi-VL model #368

Are you sure you want to change the base?

[feat] Add finetune code for Yi-VL model #368

Conversation

minlik commented Jan 30, 2024

Yimi81 commented Feb 1, 2024

Yimi81 commented Feb 1, 2024

minlik commented Feb 1, 2024

Yimi81 commented Feb 2, 2024

weijingxuan commented Feb 2, 2024

minlik commented Feb 2, 2024

minlik commented Feb 2, 2024

qiuhuiGithub commented Feb 22, 2024 • edited Loading

minlik commented Feb 22, 2024

murray-z commented Mar 7, 2024

qiuhuiGithub commented Feb 22, 2024 •

edited

Loading