[Bug] ImageEncoder INFO 日志耗时统计不准确 #1759

DefTruth · 2024-06-12T02:03:11Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/vl/engine.py#L103-L110

目前的ImageEncoder耗时统计方式：

def forward(self, inputs: List[Image]):
        """Model forward."""
        time_start = time.perf_counter()
        outputs = self.model.forward(inputs)
        time_end = time.perf_counter()
        logger.info(f'ImageEncoder forward {len(inputs)} images, '
                    f'cost {time_end - time_start:.3f}s')
        return outputs

由于pytorch执行中存在异步cuda stream，这样直接统计和在forward前后添加torch.cuda.synchronize()后得到的耗时差异巨大。

def forward(self, inputs: List[Image]):
        """Model forward."""
        torch.cuda.synchronize() # 流同步
        time_start = time.perf_counter()
        outputs = self.model.forward(inputs)
        torch.cuda.synchronize() # 流同步
        time_end = time.perf_counter()
        logger.info(f'ImageEncoder forward {len(inputs)} images, '
                    f'cost {time_end - time_start:.3f}s')
        return outputs

比如对于InternViT-6B, L20x2：

w/o sync	w sync
40ms~	800ms~

Reproduction

None

Environment

None

Error traceback

None

DefTruth · 2024-06-12T07:46:47Z

补充一下torch.cuda.synchronize()默认只同步0卡的流，如果是vision model分布在多卡。需要逐个卡号同步，否则会导致统计的耗时是真实耗时的1/N(N为卡数)

irexyc · 2024-06-12T09:05:31Z

这里直接获取 cpu tensor 应该足够了吧。

DefTruth · 2024-06-12T09:27:27Z

这里直接获取 cpu tensor 应该足够了吧。

有道理，这样更简单

lvhan028 assigned irexyc Jun 12, 2024

irexyc mentioned this issue Jun 12, 2024

More accurate time logging for ImageEncoder and fix concurrent image processing corruption #1765

Merged

DefTruth closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] ImageEncoder INFO 日志耗时统计不准确 #1759

[Bug] ImageEncoder INFO 日志耗时统计不准确 #1759

DefTruth commented Jun 12, 2024

DefTruth commented Jun 12, 2024

irexyc commented Jun 12, 2024

DefTruth commented Jun 12, 2024

[Bug] ImageEncoder INFO 日志耗时统计不准确 #1759

[Bug] ImageEncoder INFO 日志耗时统计不准确 #1759

Comments

DefTruth commented Jun 12, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

DefTruth commented Jun 12, 2024

irexyc commented Jun 12, 2024

DefTruth commented Jun 12, 2024