Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(torch): fast but unsafe buildATen & eliminating dispatches #1271

Merged
merged 8 commits into from
Jun 26, 2024

Conversation

lljbash
Copy link
Contributor

@lljbash lljbash commented Jun 24, 2024

Motivation and Context

旨在加速 torch impl。

Description

  1. 新增编译选项,可以启用快速但不安全版本的 buildATen(线程不安全,预设上层传入 torch Tensor),默认关闭,预期用法是在 dipu 中打开。
  2. 通过手动调用 dispatch 后的函数,去除大部分原有的 dispatch。

经测试,对于简单算子如 mm,单算子 cpu 耗时几乎完全对齐 torch。

Use cases (Optional)

BC-breaking (Optional)

Checklist

Before PR:

  • I have read and followed the workflow indicated in the Contributors.md to create this PR.
  • Pre-commit or linting tools indicated in Contributors.md are used to fix the potential lint issues.
  • Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
  • New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  • The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

  • CLA has been signed and all committers have signed the CLA in this PR.

@lljbash lljbash requested a review from yangbofun as a code owner June 24, 2024 08:30
@lljbash lljbash added the nvidia for nvida label Jun 24, 2024
}

// These macros is designed to avoid early destruction of the wrapper when build optional at::Tensor.
#define DIOPI_IMPL_BUILD_ATEN_LIST(atTensor, diopiTensors, numTensors) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define DIOPI_IMPL_BUILD_ATEN_LIST(atTensor, diopiTensors, numTensors) \
#define DIOPI_IMPL_BUILD_ATEN_LIST(atTensors, diopiTensors, numTensors) \

// WARNING: This function is UNSAFE. It is the caller's responsibility to ensure that:
// 1. The returned wrapper is not destroyed when its sliced at::Tensor is still in use in DIOPI.
// 2. The input diopiConstTensorHandle_t is actually a reinterpret_cast of an at::Tensor*.
// 3. The input tensor is only used in one thread.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为何要说明线程安全,因为pytorch的at::Tensor也是不保证线程安全的,我们的也不用保证。让上层调用者去保证即可。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一般来说只读是安全的吧,我这样搞 const 也被强改了

我又想到一个场景,假如同一个 storage 的不同 tensor 又新建一个 view,说不定 device 也会不对

Copy link
Collaborator

@yangbofun yangbofun Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一般来说只读是安全的吧,我这样搞 const 也被强改了

我又想到一个场景,假如同一个 storage 的不同 tensor 又新建一个 view,说不定 device 也会不对

动了storage的话会有这个问题,所以我们可以不动storage么,在at:tensor里面改device。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at::TensorImpl的成员函数device_type() 在wrapper子类中修改了,让其return cuda,是不是就解决了这个问题了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at::TensorImpl的成员函数device_type() 在wrapper子类中修改了,让其return cuda,是不是就解决了这个问题了。

这是上一个方案,问题是 at::TensorImpl 的构造比较重,这样搞效率不行

@lljbash lljbash requested a review from yangbofun June 25, 2024 09:55
@yangbofun yangbofun merged commit de8dfe7 into DeepLink-org:main Jun 26, 2024
14 of 16 checks passed
@yangbofun yangbofun deleted the llj/buildaten branch June 26, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nvidia for nvida
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants