Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ImageEncoder INFO 日志耗时统计不准确 #1759

Closed
2 tasks
DefTruth opened this issue Jun 12, 2024 · 3 comments
Closed
2 tasks

[Bug] ImageEncoder INFO 日志耗时统计不准确 #1759

DefTruth opened this issue Jun 12, 2024 · 3 comments
Assignees

Comments

@DefTruth
Copy link
Contributor

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/vl/engine.py#L103-L110

目前的ImageEncoder耗时统计方式:

def forward(self, inputs: List[Image]):
        """Model forward."""
        time_start = time.perf_counter()
        outputs = self.model.forward(inputs)
        time_end = time.perf_counter()
        logger.info(f'ImageEncoder forward {len(inputs)} images, '
                    f'cost {time_end - time_start:.3f}s')
        return outputs

由于pytorch执行中存在异步cuda stream,这样直接统计和在forward前后添加torch.cuda.synchronize()后得到的耗时差异巨大。

def forward(self, inputs: List[Image]):
        """Model forward."""
        torch.cuda.synchronize() # 流同步
        time_start = time.perf_counter()
        outputs = self.model.forward(inputs)
        torch.cuda.synchronize() # 流同步
        time_end = time.perf_counter()
        logger.info(f'ImageEncoder forward {len(inputs)} images, '
                    f'cost {time_end - time_start:.3f}s')
        return outputs

比如对于InternViT-6B, L20x2:

w/o sync w sync
40ms~ 800ms~

Reproduction

None

Environment

None

Error traceback

None
@DefTruth
Copy link
Contributor Author

补充一下torch.cuda.synchronize()默认只同步0卡的流,如果是vision model分布在多卡。需要逐个卡号同步,否则会导致统计的耗时是真实耗时的1/N(N为卡数)

@irexyc
Copy link
Collaborator

irexyc commented Jun 12, 2024

这里直接获取 cpu tensor 应该足够了吧。

@DefTruth
Copy link
Contributor Author

这里直接获取 cpu tensor 应该足够了吧。

有道理,这样更简单

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants