register buffer for `wf_unsqueeze_zero` and `wf_unsqueeze_neg_one` to… #1642

kaixuanliu · 2025-07-02T07:03:27Z

… make it work for FSDP case

kaixuanliu · 2025-07-02T07:09:36Z

We are using FSDP to finetune GPTQ model like hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4, and it will crash at L442 as self.wf_unsqueeze_zero was initialized as meta/cpu at post_init stage, but self.qzeros and self.qweight was allocated to GPU device. This PR will avoid this problem.

Qubitium · 2025-07-02T07:20:58Z

@kaixuanliu LGTM. Merged. Can you give an script example of fsdp plus gptq finetuning? We would like to add a Ci test case to guard against future regressions.

kaixuanliu · 2025-07-02T08:16:11Z

Sure, I use the sft example in peft, but it needs to wait this PR 2626 merged. Related cmd line is here:
accelerate launch --config_file "fsdp_config.yaml" train.py --seed 100 --model_name_or_path "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4" --dataset_name "smangrul/ultrachat-10k-chatml" --chat_template_format "chatml" --add_special_tokens False --append_concat_token False --splits "train,test" --max_seq_len 2048 --num_train_epochs 1 --logging_steps 5 --log_level "info" --logging_strategy "steps" --eval_strategy "epoch" --save_strategy "epoch" --bf16 True --packing True --learning_rate 1e-4 --lr_scheduler_type "cosine" --weight_decay 1e-4 --warmup_ratio 0.0 --max_grad_norm 1.0 --output_dir "llama-sft-lora-fsdp" --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 4 --gradient_checkpointing True --use_reentrant False --dataset_text_field "content" --use_flash_attn False --use_peft_lora True --lora_r 8 --lora_alpha 16 --lora_dropout 0.1 --lora_target_modules "q_proj,k_proj,v_proj,o_proj,up_proj,gate_proj" --use_4bit_quantization False. And 1 reminder: pls use transformers 4.52.4, the latest transformers has bug for this example, I am looking at this.

… make it work for FSDP case Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Qubitium merged commit 71f3740 into ModelCloud:main Jul 2, 2025

register buffer for wf_unsqueeze_zero and wf_unsqueeze_neg_one to…

21f8fa7

… make it work for FSDP case Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

Qubitium mentioned this pull request Jul 3, 2025

Feature: Add FSDP Peft/Lora finetuning sample #1648

Open

kaixuanliu mentioned this pull request Jul 3, 2025

enable FSDP example for model `hugging-quants/Meta-Llama-3.1-8B-Instr… huggingface/peft#2626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

register buffer for `wf_unsqueeze_zero` and `wf_unsqueeze_neg_one` to… #1642

register buffer for `wf_unsqueeze_zero` and `wf_unsqueeze_neg_one` to… #1642

Uh oh!

kaixuanliu commented Jul 2, 2025

Uh oh!

kaixuanliu commented Jul 2, 2025

Uh oh!

Qubitium commented Jul 2, 2025

Uh oh!

kaixuanliu commented Jul 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

register buffer for wf_unsqueeze_zero and wf_unsqueeze_neg_one to… #1642

register buffer for wf_unsqueeze_zero and wf_unsqueeze_neg_one to… #1642

Uh oh!

Conversation

kaixuanliu commented Jul 2, 2025

Uh oh!

kaixuanliu commented Jul 2, 2025

Uh oh!

Qubitium commented Jul 2, 2025

Uh oh!

kaixuanliu commented Jul 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

register buffer for `wf_unsqueeze_zero` and `wf_unsqueeze_neg_one` to… #1642

register buffer for `wf_unsqueeze_zero` and `wf_unsqueeze_neg_one` to… #1642