Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC] [TPAT] [TensorRT] Add the ability to automatically generate TensorRT plugins using TVM #15526

Closed
wants to merge 14 commits into from

Conversation

Civitasv
Copy link
Contributor

@Civitasv Civitasv commented Aug 11, 2023

TPAT: TVM Plugin Autogen Tool

Disclaimer: This PR is based on Tencent's TPAT.

Purpose: Tencent's TPAT should be used with their TVM fork: BlazerML-tvm, but they haven't synchronized it with the upstream for a long time, also some bugs are not resolved. In light of these issues, I decide to try integrating it to TVM.

Objective: The primary goal is to offer a clear and user-friendly API.

Architecture

image

Currently, only TensorRT is supported.

In essence, this solution is built upon the Template Engine (Jinja) in Python to create plugin templates for vendor-specific acceleration libraries. It then utilizes TVM for optimization and code generation targeting the respective platforms. The generated code is rendered and filled into the templates. Subsequently, platform-specific build commands are invoked to build the plugins, which ultimately serve as extensions for the corresponding vendor's acceleration library.

Inputs & Outputs

The entry of TPAT for TensorRT is as follows:

def pipeline(
    onnx_file: str, node_names: list[str], enable_tunning: bool, work_dir: str, output_onnx: str
) -> Tuple[str, list[str]]:

This entry point accepts an ONNX file, a list of nodes to be tuned, the log database location, and the output ONNX file path where the modified model will be stored.

After generating plugins for each node, the function returns the path of the output ONNX file along with a list of paths where the plugins are saved, facilitating subsequent loading.

TODO

  • User should have the ability to change tunning option.
  • Add benchmark section.
  • Currently, the frontend is Relay, the tunning method is MetaSchedule, we should a flexible way to support Relax and other tunning method.
  • Consider potential improvements to the API on the C++ side. Currently I use some global variables, then register global functions to get these variables, it feels like a hack to me, anyway, I'm not quite familiar with TVM's way to do it, so please give me some advice.
  • Explore dynamic batch support, currently, only static batch is supported, the original repo supports it, but I think it's a little mess, I believe there exists a more elegant way to support this feature.
  • Investigate the generation of QNN plugins for Qualcomm platforms.

Reference

@tvm-bot
Copy link
Collaborator

tvm-bot commented Aug 11, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@Civitasv
Copy link
Contributor Author

Civitasv commented Aug 11, 2023

cc @tqchen @Hzfengsy @FrozenGene

@Hzfengsy
Copy link
Member

Thanks, @Civitasv for this great work! There are notable things:

  1. This is the improvement based on Relay, not Relax, so it should be sent to main branch instead of the unity branch
  2. It's an awesome and big feature, having an RFC (https://github.com/apache/tvm-rfcs) before PR would be good
  3. This PR is a bit large to review, could it be separated into several small ones, together with a tracking issue after the RFC

@Civitasv
Copy link
Contributor Author

This is the improvement based on Relay, not Relax, so it should be sent to main branch instead of the unity branch

The final goal is to support both Relay and Relax, but I agree currently it should be sent to main branch.

Okay, I will write an RFC.

This PR is a bit large to review, could it be separated into several small ones, together with a tracking issue after the RFC

I will try to separate it.

@Civitasv
Copy link
Contributor Author

I've already proposed an RFC. See apache/tvm-rfcs#103.

@buptqq
Copy link

buptqq commented Aug 14, 2023

I've already proposed an RFC. See apache/tvm-rfcs#103.
Hi, I am the author of TPAT. If you need any help, you can contact me through this email : qianqiu@tencent.com

@Civitasv
Copy link
Contributor Author

I've already proposed an RFC. See apache/tvm-rfcs#103.
Hi, I am the author of TPAT. If you need any help, you can contact me through this email : qianqiu@tencent.com

@buptqq Thanks for your great work! It helps me a lot, If you are still working at this project, can you review the code? I've changed much.

@Civitasv
Copy link
Contributor Author

I've improved the code, the workflow should be clear if you've read the RFC. 😄

@buptqq
Copy link

buptqq commented Aug 21, 2023

I've already proposed an RFC. See apache/tvm-rfcs#103.
Hi, I am the author of TPAT. If you need any help, you can contact me through this email : qianqiu@tencent.com

@buptqq Thanks for your great work! It helps me a lot, If you are still working at this project, can you review the code? I've changed much.

OK, I will review this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants