-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【discuss】分布式paddle性能问题,cpu集群 #1359
Labels
Comments
|
|
5万个样本为什么要用100个节点跑?另外,减少trainer_count,不会带来数量级上的性能提升,不过能够对 |
@hedaoyuan 现在是做性能测试,对比单机、多机的性能,所以用的比较小的数据集 |
zhhsplendid
pushed a commit
to zhhsplendid/Paddle
that referenced
this issue
Sep 25, 2019
* Rewrite firstn and shuffle functions, test=develop * Rewrite firstn and shuffle functions, test=develop * update, test=develop * updata, test=develop * update reader.shuffle and reader.firstn, test=develop
lizexu123
pushed a commit
to lizexu123/Paddle
that referenced
this issue
Feb 23, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
具体场景如下:
2.输入数据为序列(长度为2个词),然后预测第3个词,词表大小为200万
3.训练节点采用100,单个节点batch_size为2000,trainer_count为32,优化方法为momentum sync
4.每个节点cpu利用率比较低
现在训练速度为 9.3s训练1万样本,非常慢,请问这个性能是否符合预期,有没有优化的建议?
The text was updated successfully, but these errors were encountered: