Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多卡增量训练,数据集处理完后tokenizer都不进行的 #3723

Closed
1 task done
pan-xi opened this issue May 13, 2024 · 0 comments
Closed
1 task done

多卡增量训练,数据集处理完后tokenizer都不进行的 #3723

pan-xi opened this issue May 13, 2024 · 0 comments
Labels
wontfix This will not be worked on

Comments

@pan-xi
Copy link

pan-xi commented May 13, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

使用镜像:nvcr.io/nvidia/pytorch:24.01-py3
单卡是没问题的
执行命令:
bash examples/lora_multi_gpu/ds_zero3.sh(bash examples/lora_multi_gpu/single_node.sh的效果也一样)

yaml文件只修改了模型模型名称
image
bash文件只修改了进程数
image
config文件未做修改
image

Expected behavior

image
处理完数据集后就不动了,也没有tokenizer的信息,accelerate和deepspeed的效果都一样的
对应nvdia-smi的状态
image

System Info

image

Others

参考了https://github.com/hiyouga/LLaMA-Factory/issues/1683https://github.com/hiyouga/LLaMA-Factory/issues/1651https://github.com/hiyouga/LLaMA-Factory/issues/1135但是未解决

@hiyouga hiyouga added pending This problem is yet to be addressed. labels May 13, 2024
@hiyouga hiyouga added wontfix This will not be worked on and removed pending This problem is yet to be addressed. labels May 29, 2024
@hiyouga hiyouga closed this as not planned Won't fix, can't repro, duplicate, stale May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants