Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU]执行import paddle语句后程序释放时执行时间长 #785

Closed
zhjc opened this issue Sep 7, 2023 · 3 comments
Closed

[NPU]执行import paddle语句后程序释放时执行时间长 #785

zhjc opened this issue Sep 7, 2023 · 3 comments

Comments

@zhjc
Copy link

zhjc commented Sep 7, 2023

Paddle commit id:12301bc5337fa3bc2d07050d240fbac3689fa9ce
PaddleCustomDevice最新develop分支,编译代码安装whl包

执行一条语句import paddle,进程退出时间较长,通过npu-smi info查看是八张卡,顺序的在每张卡出现一个进程,所有卡上进程消除后程序退出,示例如下:
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| No running processes found in NPU 3 |
+===========================+===============+====================================================+
| 4 0 | 4729 | python | 65 |
+===========================+===============+====================================================+
| No running processes found in NPU 5 |
+===========================+===============+====================================================+
| No running processes found in NPU 6 |
+===========================+===============+====================================================+
| No running processes found in NPU 7 |
+===========================+===============+====================================================+

@qili93
Copy link
Collaborator

qili93 commented Mar 18, 2024

您好,你可以尝试打开CPU和NPU的profiler看下进程释放的主要耗时是在哪里,可能怀疑是由于NPU CANN cache释放导致的。可以参考如下示例看下timeline,谢谢~~

import time
import datetime

import paddle
import paddle.nn as nn
import paddle.nn.functional as F

###### 下面两行初始化 profiler
import paddle.profiler as profiler
profiler = profiler.Profiler(targets=[profiler.ProfilerTarget.CUSTOM_DEVICE], custom_device_types=['npu'])

paddle.set_device("npu")

EPOCH_NUM = 1
BATCH_NUM = 5000

for epoch_id in range(EPOCH_NUM):
  epoch_start = time.time()
  for iter_id in range(BATCH_NUM):
    if iter_id == 100:
      profiler.start() ######  这里启动 profiler
    layer = nn.Sequential(
                nn.Conv2D(in_channels=3, out_channels=6, 
                          kernel_size=5, stride=1, padding=0), # Input: 4,1,28,28 => Output: 4,6,24,24
                nn.BatchNorm2D(num_features=6)) # Input: 4,6,24,24 => Output: 4,6,24,24
    input = paddle.rand(shape=[256, 3, 224, 224])
    output = layer(input)
    if iter_id == 100:
      profiler.stop()  ######  这里停止 profiler
      break
    if (iter_id+1) % 100 == 0:
      print(f"Iter[{iter_id}/{BATCH_NUM}] - output.shape={output.shape}")
  epoch_cost = time.time() - epoch_start
  print(f"Epoch[{epoch_id}/{EPOCH_NUM}] - CONV+BN repeat: {BATCH_NUM}, Time Cost:{epoch_cost}")

print(output.shape)

采集完成之后,还需要参考 https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha002/devaids/auxiliarydevtool/atlasprofiling_16_0024.html 这个文档进行性能数据解析。

@qili93
Copy link
Collaborator

qili93 commented Apr 9, 2024

您好,请问以上回答是否有解决您的问题,谢谢!

@qili93
Copy link
Collaborator

qili93 commented May 22, 2024

Close as no more comments, thanks!

@qili93 qili93 closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants