Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【discuss】分布式paddle性能问题,cpu集群 #1359

Closed
pkuyym opened this issue Feb 17, 2017 · 4 comments
Closed

【discuss】分布式paddle性能问题,cpu集群 #1359

pkuyym opened this issue Feb 17, 2017 · 4 comments
Assignees
Labels

Comments

@pkuyym
Copy link
Contributor

pkuyym commented Feb 17, 2017

具体场景如下:

  1. 使用一个单元的simple_gru2网络,在cpu集群上运行
    2.输入数据为序列(长度为2个词),然后预测第3个词,词表大小为200万
    3.训练节点采用100,单个节点batch_size为2000,trainer_count为32,优化方法为momentum sync
    4.每个节点cpu利用率比较低
    现在训练速度为 9.3s训练1万样本,非常慢,请问这个性能是否符合预期,有没有优化的建议?
@pkuyym pkuyym changed the title 【discuss】分布式paddle性能问题 【discuss】分布式paddle性能问题,cpu集群 Feb 17, 2017
@hedaoyuan
Copy link
Contributor

  1. 如果集群的CPU是支持AVX的,可以使用一个AVX版本的paddle,会快一些。
# paddle version
    with_avx: ON

  1. 试试减少trainer_count,增大batch_size;应该能提升训练速度。
  2. 训练速度为 9.3s训练1万样本 这个没法判断,这个是怎么测试出来的?总样本是多少?

@reyoung reyoung self-assigned this Feb 20, 2017
@pkuyym
Copy link
Contributor Author

pkuyym commented Feb 20, 2017

@hedaoyuan

  1. 是avx版本
    2.batch_size已经是最大量级,减少trainer_count,我理解应该是减少线程竞争,增大cpu利用率,感觉不能带来量级上的提升
    3.一共5万个样本,迭代6轮取平均的结果,平均每轮46.7s
    现在想看一下性能是否正常,以及是否存在配置上的优化,可以大幅提升性能(几十倍)

@hedaoyuan
Copy link
Contributor

5万个样本为什么要用100个节点跑?另外,减少trainer_count,不会带来数量级上的性能提升,不过能够对每个节点cpu利用率比较低这个问题有帮助。

@pkuyym
Copy link
Contributor Author

pkuyym commented Feb 20, 2017

@hedaoyuan 现在是做性能测试,对比单机、多机的性能,所以用的比较小的数据集

@pkuyym pkuyym closed this as completed Aug 2, 2017
zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this issue Sep 25, 2019
* Rewrite firstn and shuffle functions, test=develop

* Rewrite firstn and shuffle functions, test=develop

* update, test=develop

* updata, test=develop

* update reader.shuffle and reader.firstn, test=develop
lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants