Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update resnet50 benchmark data #5578

Merged
merged 2 commits into from
Nov 22, 2017
Merged

Conversation

tensor-tang
Copy link
Contributor

MKLML and MKL-DNN data are tested with latest docker.
OpenBLAS data is tested with local compiled since do not have openblas docker.

|--------------|-------| ------| -------|
| OpenBLAS | 22.90 | 23.10 | 25.59 |
| MKLML | 29.81 | 30.18 | 32.77 |
| MKL-DNN | 80.49 | 82.89 | 83.13 |
Copy link
Contributor

@luotao1 luotao1 Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchSize 64 128 256
MKLML 25.77 23.98 24.27
MKL-DNN 77.57 77.99 74.23

这里有两个测试上的差别:

  • 我是用centos6测的,但PR里是用centos7
  • 我的CPU是6148,但PR里是6148M

是否因为这两个差别,导致我测出来的值,在bs增大时反而会下降呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

比较了下百分比, (我的值-你的值)/你的值,差别还是挺大的。
MKLML的差别普遍比MKL-DNN的差别大。

15.68% 25.85% 35.02%
3.76% 6.28% 11.99%

可以看下这个cat /sys/devices/system/cpu/cpu*/online | grep -o '1' | wc -l的值吗

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个值是39,但我有40个processor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在我的机器上这两者都是40

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢 @BlackZhengSQ 解答:cpu0不支持online/offline(其他cpu可以),但cpu0一直处于工作状态。
在docker中使用top查看,差不多能跑满40个core:
image
所以不存在少一个core的情况。

Copy link
Contributor Author

@tensor-tang tensor-tang Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看到mklml的差距确实还挺大的,并且随着batchsize增大,请问下numa是否是打开的?numa是用于分配内存的,6148应该有2个numa node。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用lscpu命令:我的numa没有打开。

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    1
Core(s) per socket:    20
CPU socket(s):         2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Stepping:              4
CPU MHz:               2401.000
BogoMIPS:              4804.43
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              28160K
NUMA node0 CPU(s):     0-39

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

开启NUMA后,果然数值对上了,还略快:

BatchSize 64 128 256
MKLML 33.32 31.68 33.12
MKL-DNN 81.69 82.35 84.08

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good News, 那直接用你的数据就好了。毕竟你测试的code base也比较新。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外补充下,这里的NUMA问题应该也适用于VGG上的差距。

@luotao1 luotao1 merged commit f7fc6c2 into PaddlePaddle:develop Nov 22, 2017
@tensor-tang tensor-tang moved this from Doing to Done in Optimization on Intel Platform Nov 22, 2017
@tensor-tang tensor-tang deleted the benchmark branch November 22, 2017 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants