Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu benchmark does not support PyTorch 1.5.0. #77

Closed
gaoteng-git opened this issue Jun 8, 2020 · 6 comments · Fixed by #85 or #86
Closed

gpu benchmark does not support PyTorch 1.5.0. #77

gaoteng-git opened this issue Jun 8, 2020 · 6 comments · Fixed by #85 or #86
Labels
bug Something isn't working

Comments

@gaoteng-git
Copy link

gaoteng-git commented Jun 8, 2020

您好,我试图用最新的pytorch做gpu benchmark的对照组,配置如下:
pytorch: 1.5.0
torchvision: 0.6.0
CUDA: 10.2
OS: Ubuntu18.04
即Dockerfile.gpu里对应行修改为"conda install pytorch=1.5.0 torchvision=0.6.0 cudatoolkit=10.2 -c pytorch"

在docker里build后在test时,有好几个测试用例通不过:

Test project /tmp/build
Start 1: tt_core_test
1/12 Test #1: tt_core_test ..................... Passed 0.52 sec
Start 2: tt_kernels_test
2/12 Test #2: tt_kernels_test .................. Passed 29.18 sec
Start 3: bert_attention_test
3/12 Test #3: bert_attention_test ..............***Failed 4.50 sec
date time ( uptime ) [ thread name/id ] file:line v|
2020-06-08 13:10:51.358 ( 0.000s) [main thread ] loguru.cpp:610 INFO| arguments: turbo_transformers_cxx
2020-06-08 13:10:51.358 ( 0.000s) [main thread ] loguru.cpp:613 INFO| Current dir: /tmp/build/turbo_transformers/python
2020-06-08 13:10:51.358 ( 0.000s) [main thread ] loguru.cpp:615 INFO| stderr verbosity: 0
2020-06-08 13:10:51.358 ( 0.000s) [main thread ] loguru.cpp:616 INFO| -----------------------------------
FFFFFFFFFFFFFFFFFFFFFFFBertAttention "(1,010)" CPU Torch QPS, 492.80298203436234, time, 0.002029208500061941
BertAttention "(1,010)" CPU Turbo QPS, 1082.363535550833, time, 0.0009239040000466048`

...

The following tests FAILED:
3 - bert_attention_test (Failed)
5 - bert_encoder_test (Failed)
6 - bert_intermediate_test (Failed)
7 - bert_layer_test (Failed)
8 - bert_model_test (Failed)
9 - bert_output_test (Failed)
10 - bert_pooler_test (Failed)

请问目前这个gpu benchmark对pytorch版本最高支持到多少呢?官方首页里的benchmark实验结果,是和哪个版本的pytorch比较的呢?

@feifeibear
Copy link
Collaborator

现在比较的是1.4.0。已知Pytorch 1.5.0对tensor transpose,concat的操作和1.4.0结果不一致,这是Pytorch bug还是feature有待考证。
你可以测一下1.4.0和1.5.0性能有啥区别,应该没有区别。

@gaoteng-git
Copy link
Author

现在比较的是1.4.0。已知Pytorch 1.5.0对tensor transpose,concat的操作和1.4.0结果不一致,这是Pytorch bug还是feature有待考证。
你可以测一下1.4.0和1.5.0性能有啥区别,应该没有区别。

谢谢!测了一下,pytorch1.4.0和1.5.0在该GPU benchmark上确实没有什么性能差别。

@feifeibear feifeibear added the bug Something isn't working label Jun 15, 2020
@feifeibear feifeibear changed the title gpu benchmark中对pytorch的版本支持 gpu benchmark中对pytorch 1.5.0的版本不支持 Jun 15, 2020
@feifeibear feifeibear reopened this Jun 15, 2020
@feifeibear
Copy link
Collaborator

The member function from_torch of class BertModelWithPooler and BertModel does not support PyTorch version as 1.5.0. In my opinion, the tensor transpose API of PyTorch is not stable. We use the following way to transpose weight matrices.

weight = torch.clone(torch.t(pooler_params['dense.weight']))

I have no idea, why it does not work as predicted in Pytorch 1.5.0.

@feifeibear feifeibear changed the title gpu benchmark中对pytorch 1.5.0的版本不支持 gpu benchmark does not support PyTorch 1.5.0. Jun 15, 2020
@feifeibear feifeibear linked a pull request Jun 15, 2020 that will close this issue
@feifeibear
Copy link
Collaborator

The bug will fix in version v0.3.0

@feifeibear feifeibear linked a pull request Jun 28, 2020 that will close this issue
@feifeibear
Copy link
Collaborator

The bug fixed!

@gaoteng-git
Copy link
Author

Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants