Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608) #600

Closed
srikant86panda opened this issue Jul 17, 2023 · 4 comments

Comments

@srikant86panda
Copy link

srikant86panda commented Jul 17, 2023

Training with load_in_4bit leads to RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608)

Version used
transformers: 4.30.2
bitsandbytes: 0.40.0

import torch
from trl import SFTTrainer
from functools import partial
from datasets import ClassLabel, DatasetDict, load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, BitsAndBytesConfig

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )
    
model_name = "Salesforce/xgen-7b-8k-base"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(model_name, device_map={"": 0}, torch_dtype=torch.float16, load_in_4bit=True)

model = prepare_model_for_int8_training(model)

raw_dataset = load_dataset('tatsu-lab/alpaca', split='train')

peft_config = LoraConfig(r=16, lora_alpha=32, lora_dropout=0.05, bias='none', task_type='CAUSAL_LM')
model = get_peft_model(model, peft_config)
print_trainable_parameters(model)

training_args = TrainingArguments(output_dir='suit_script_SSL',
                                  per_device_train_batch_size=1,
                                  optim='adamw_torch',
                                  logging_steps=100,
                                  learning_rate=2e-4,
                                  fp16=True,
                                  warmup_ratio=0.1,
                                  lr_scheduler_type='linear',
                                  num_train_epochs=1,
                                  save_strategy="epoch",
                                  report_to='none'
                                )

trainer = SFTTrainer(model=model,
                     train_dataset=raw_dataset,
                     dataset_text_field='text',
                     max_seq_length=1024,
                     tokenizer=tokenizer,
                     args=training_args,
                     packing=True,
                     peft_config=peft_config
                    )
trainer.train()

Error detail
RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/peft/peft_model.py", line 678, in forward return self.base_model( File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 688, in forward outputs = self.model( File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 578, in forward layer_outputs = decoder_layer( File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 194, in forward query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/srikapan/anaconda3/envs/llm_peft/lib/python3.9/site-packages/peft/tuners/lora.py", line 565, in forward result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x4096 and 1x8388608)

@tarungupta83
Copy link

Same issue with me as well Error Below:

I am running the same code in colab notebook, I am getting error :
│ │
│ /usr/local/lib/python3.10/dist-packages/peft/tuners/lora.py:565 in forward │
│ │
│ 562 │ │ │ │ self.unmerge() │
│ 563 │ │ │ result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self. │
│ 564 │ │ elif self.r[self.active_adapter] > 0 and not self.merged: │
│ ❱ 565 │ │ │ result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self. │
│ 566 │ │ │ │
│ 567 │ │ │ x = x.to(self.lora_A[self.active_adapter].weight.dtype) │
│ 568 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x4096 and 1x8388608)

@CRyan2016
Copy link

CRyan2016 commented Jul 18, 2023

it may cause by peft. I install peft with git+https://github.com/huggingface/peft.git, peft-0.3.0 not work

@srikant86panda
Copy link
Author

@CRyan2016 do you mean by updating PEFT you could able to get it work?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants