model saving error #81

imrankh46 · 2023-04-17T14:10:17Z

the trainer not save the mode weights . its give me the following error

OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 14.75 
GiB total capacity; 12.97 GiB already allocated; 6.81 MiB free; 13.69 GiB 
reserved in total by PyTorch) If reserved memory is >> allocated memory try 
setting max_split_size_mb to avoid fragmentation.  See documentation for Memory 
Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

Facico · 2023-04-18T01:19:43Z

Your error is exceeding the GPU memory limit. It should be unrelated to model saving. Did your program train properly when it was running?

imrankh46 · 2023-04-18T03:38:46Z

Your error is exceeding the GPU memory limit. It should be unrelated to model saving. Did your program train properly when it was running?

No, when all the epochs completed, so they showing this behavior. We can not save llama weights like other model using trianer.save_pretrianed() method
Or model.save_model().

SunnyMarkLiu · 2023-04-19T11:06:51Z

Same error to me!

Facico · 2023-04-19T11:08:38Z

What is the version of your transformers?

imrankh46 · 2023-04-19T11:08:47Z

Same error to me!

I solve the error.
Just add this code.

model.cpp()

And then save the model

imrankh46 · 2023-04-19T11:10:12Z

What is the version of your transformers?

Same like you.

Facico · 2023-04-19T11:18:07Z

@imrankh46 Our transformers is pulled directly from github, so there may be a slight difference. The commit hash of our transformers at the time was roughly the same as ff20f9cf3615a8638023bc82925573cb9d0f3560. Maybe you can slove the question by uninstalling transformers and reinstalling it as "git+https://github.com/huggingface/transformers@ff20f9cf3615a8638023bc82925573cb9d0f3560"

imrankh46 · 2023-04-19T11:21:00Z

@imrankh46 Our transformers is pulled directly from github, so there may be a slight difference. The commit hash of our transformers at the time was roughly the same as ff20f9cf3615a8638023bc82925573cb9d0f3560. Maybe you can slove the question by uninstalling transformers and reinstalling it as "git+https://github.com/huggingface/transformers@ff20f9cf3615a8638023bc82925573cb9d0f3560"

I tried, but they not working.
I think The llama model code or tokenizer written in cpp. The model is train successfully.

After saving they give out of cuda error.

I will also try your approach..

Facico · 2023-04-19T11:24:33Z

same issue in other repo like you. You can also refer to their method to downgrade the version of bitsandbytes

imrankh46 · 2023-04-19T11:29:01Z

same issue in other repo like you. You can also refer to their method to downgrade the version of bitsandbytes

Thank you.

Facico mentioned this issue Apr 20, 2023

在2080ti上运行 finetune提示错误 #91

Closed

Facico closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model saving error #81

model saving error #81

imrankh46 commented Apr 17, 2023

Facico commented Apr 18, 2023 •

edited

Loading

imrankh46 commented Apr 18, 2023

SunnyMarkLiu commented Apr 19, 2023

Facico commented Apr 19, 2023

imrankh46 commented Apr 19, 2023

imrankh46 commented Apr 19, 2023

Facico commented Apr 19, 2023

imrankh46 commented Apr 19, 2023

Facico commented Apr 19, 2023

imrankh46 commented Apr 19, 2023

model saving error #81

model saving error #81

Comments

imrankh46 commented Apr 17, 2023

Facico commented Apr 18, 2023 • edited Loading

imrankh46 commented Apr 18, 2023

SunnyMarkLiu commented Apr 19, 2023

Facico commented Apr 19, 2023

imrankh46 commented Apr 19, 2023

imrankh46 commented Apr 19, 2023

Facico commented Apr 19, 2023

imrankh46 commented Apr 19, 2023

Facico commented Apr 19, 2023

imrankh46 commented Apr 19, 2023

Facico commented Apr 18, 2023 •

edited

Loading