[Question] 预训练时间和预训练数据 #43

coye01 · 2023-06-16T05:57:21Z

Required prerequisites

I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
Consider asking first in a Discussion.

Questions

请问该模型在千卡集群上训练了多久啊?
README中提到了在大约 1.2T token上做了预训练，数据中语言的分布是怎样的啊?
感谢回复!

Checklist

I have provided all relevant and necessary information above.
I have chosen a suitable title for this issue.

formath · 2023-06-16T09:52:21Z

推算一下，7B模型，1.2万亿token，1000张A800，0.58利用率，训练一个epoch是4天左右。

miraclezqc · 2023-06-19T06:05:01Z

推算一下，7B模型，1.2万亿token，1000张A800，0.58利用率，训练一个epoch是4天左右。

看配置好像是纯data parallel，没有开tensor parallel吗？

formath · 2023-06-19T06:27:03Z

推算一下，7B模型，1.2万亿token，1000张A800，0.58利用率，训练一个epoch是4天左右。

看配置好像是纯data parallel，没有开tensor parallel吗？

猜测应该开了tensor和pipeline并行，否则很难达到0.58利用率

miraclezqc · 2023-06-19T06:32:27Z

推算一下，7B模型，1.2万亿token，1000张A800，0.58利用率，训练一个epoch是4天左右。

看配置好像是纯data parallel，没有开tensor parallel吗？

猜测应该开了tensor和pipeline并行，否则很难达到0.58利用率

7B开pipeline应该不至于，tp开的话可能也是2，因为seq length为4096，按照global batch size为4M推测，micro batch size和 gradient accumulate都是1，那千卡应该是纯dp的，除非是2000卡。。

Luoyingfeng8 · 2023-06-21T03:08:03Z

推算一下，7B模型，1.2万亿token，1000张A800，0.58利用率，训练一个epoch是4天左右。

看配置好像是纯data parallel，没有开tensor parallel吗？

猜测应该开了tensor和pipeline并行，否则很难达到0.58利用率

7B开pipeline应该不至于，tp开的话可能也是2，因为seq length为4096，按照global batch size为4M推测，micro batch size和 gradient accumulate都是1，那千卡应该是纯dp的，除非是2000卡。。

7B在80G上不用开TP，只需要sharding=8即可，多机间就是纯dp，这样训练速度和吞吐量应该都是最优的

mynewstart · 2023-08-14T14:04:47Z

我想问下这个代码是把数据一次性加载进内存了，如果数据量很大1.4T tokens大概5T左右的数据量，是不是内存放不下呀。

coye01 added the question Further information is requested label Jun 16, 2023

coye01 changed the title ~~[Question] 预训练时间~~ [Question] 预训练时间和预训练数据 Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] 预训练时间和预训练数据 #43

[Question] 预训练时间和预训练数据 #43

coye01 commented Jun 16, 2023 •

edited

formath commented Jun 16, 2023

miraclezqc commented Jun 19, 2023

formath commented Jun 19, 2023

miraclezqc commented Jun 19, 2023

Luoyingfeng8 commented Jun 21, 2023

mynewstart commented Aug 14, 2023

[Question] 预训练时间和预训练数据 #43

[Question] 预训练时间和预训练数据 #43

Comments

coye01 commented Jun 16, 2023 • edited

Required prerequisites

Questions

Checklist

formath commented Jun 16, 2023

miraclezqc commented Jun 19, 2023

formath commented Jun 19, 2023

miraclezqc commented Jun 19, 2023

Luoyingfeng8 commented Jun 21, 2023

mynewstart commented Aug 14, 2023

coye01 commented Jun 16, 2023 •

edited