RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608) #600

srikant86panda · 2023-07-17T14:01:33Z

Training with load_in_4bit leads to RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608)

Version used
transformers: 4.30.2
bitsandbytes: 0.40.0

import torch
from trl import SFTTrainer
from functools import partial
from datasets import ClassLabel, DatasetDict, load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )
    
model_name = "Salesforce/xgen-7b-8k-base"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(model_name, device_map={"": 0}, torch_dtype=torch.float16, load_in_4bit=True)

model = prepare_model_for_int8_training(model)

raw_dataset = load_dataset('tatsu-lab/alpaca', split='train')

peft_config = LoraConfig(r=16, lora_alpha=32, lora_dropout=0.05, bias='none', task_type='CAUSAL_LM')
model = get_peft_model(model, peft_config)
print_trainable_parameters(model)

training_args = TrainingArguments(output_dir='suit_script_SSL',
                                  per_device_train_batch_size=1,
                                  optim='adamw_torch',
                                  logging_steps=100,
                                  learning_rate=2e-4,
                                  fp16=True,
                                  warmup_ratio=0.1,
                                  lr_scheduler_type='linear',
                                  num_train_epochs=1,
                                  save_strategy="epoch",
                                  report_to='none'
                                )

trainer = SFTTrainer(model=model,
                     train_dataset=raw_dataset,
                     dataset_text_field='text',
                     max_seq_length=1024,
                     tokenizer=tokenizer,
                     args=training_args,
                     packing=True,
                     peft_config=peft_config
                    )
trainer.train()

Error detail
RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward outputs = self.model( File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 578, in forward layer_outputs = decoder_layer( File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/peft/tuners/lora.py", line 565, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608)

The text was updated successfully, but these errors were encountered:

tarungupta83 · 2023-07-17T14:38:20Z

Same issue with me as well Error Below:

I am running the same code in colab notebook, I am getting error :
│ │
│ /usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py:565 in forward │
│ │
│ 562 │ │ │ │ self.unmerge() │
│ 563 │ │ │ result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self. │
│ 564 │ │ elif self.r[self.active_adapter] > 0 and not self.merged: │
│ ❱ 565 │ │ │ result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self. │
│ 566 │ │ │ │
│ 567 │ │ │ x = x.to(self.lora_A[self.active_adapter].weight.dtype) │
│ 568 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x4096 and 1x8388608)

CRyan2016 · 2023-07-18T02:52:16Z

it may cause by peft. I install peft with git+https://github.com/huggingface/peft.git, peft-0.3.0 not work

srikant86panda · 2023-07-24T09:17:17Z

@CRyan2016 do you mean by updating PEFT you could able to get it work?

github-actions · 2023-12-20T15:15:10Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions bot closed this as completed Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608) #600

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608) #600

srikant86panda commented Jul 17, 2023 •

edited

Loading

tarungupta83 commented Jul 17, 2023

CRyan2016 commented Jul 18, 2023 •

edited

Loading

srikant86panda commented Jul 24, 2023

github-actions bot commented Dec 20, 2023

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608) #600

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608) #600

Comments

srikant86panda commented Jul 17, 2023 • edited Loading

tarungupta83 commented Jul 17, 2023

CRyan2016 commented Jul 18, 2023 • edited Loading

srikant86panda commented Jul 24, 2023

github-actions bot commented Dec 20, 2023

srikant86panda commented Jul 17, 2023 •

edited

Loading

CRyan2016 commented Jul 18, 2023 •

edited

Loading