Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation] Detail when Myelin is used and why #2576

Closed
fxmarty opened this issue Dec 30, 2022 · 7 comments
Closed

[Documentation] Detail when Myelin is used and why #2576

fxmarty opened this issue Dec 30, 2022 · 7 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@fxmarty
Copy link

fxmarty commented Dec 30, 2022

Description

Following: #2308 (comment)

Searching "Myelin" in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html and https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/index.html yields no result.

The doc should answer questions as:

  • What is Myelin?
  • Is it possible to disable Myelin? If no, why not?
  • Which nodes are consumed by Myelin and which are not? Under which conditions?

Related: huggingface/optimum#605

Examples where a documentation would help

My log (this bit is super slow, when building the engine for https://huggingface.co/EleutherAI/gpt-j-6B ):

[12/30/2022-15:39:09] [V] [TRT] --------------- Timing Runner: {ForeignNode[transformer.h.0.attn.bias.../Cast]} (Myelin)
[12/30/2022-15:48:29] [V] [TRT] Tactic: 0x0000000000000000 Time: 17.8375
[12/30/2022-15:48:30] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 17.8375
[12/30/2022-15:48:30] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[12/30/2022-15:48:30] [V] [TRT] Formats and tactics selection completed in 567.208 seconds.
...
[12/30/2022-15:48:47] [V] [TRT] Engine generation completed in 584.689 seconds.

Which likely corresponds to the following part of my graph:

image

The user can try and guess why Myelin is used, but since there is no doc, it is hard.

For the model https://huggingface.co/anton-l/gpt-j-tiny-random , I have an issue at a different part of the model, still involving a Cast:

[12/30/2022-17:37:33] [V] [TRT] --------------- Timing Runner: {ForeignNode[transformer.h.0.ln_1.bias.../Cast]} (Myelin)
[12/30/2022-17:37:56] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.23026
[12/30/2022-17:37:56] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 1.23026
[12/30/2022-17:37:56] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[12/30/2022-17:37:56] [V] [TRT] Formats and tactics selection completed in 28.5682 seconds.
...
[12/30/2022-17:37:56] [V] [TRT] Engine generation completed in 29.8719 seconds.

image

Again, it's unclear what's wrong.

Yet an other one (after using onnxsim --skip-constant-folding on https://huggingface.co/anton-l/gpt-j-tiny-random):

[12/30/2022-19:01:01] [V] [TRT] --------------- Timing Runner: {ForeignNode[(Unnamed Layer* 5216) [Shuffle].../lm_head/Add]} (Myelin)
[12/30/2022-19:01:17] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.39995
[12/30/2022-19:01:17] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 1.39995
[12/30/2022-19:01:17] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[12/30/2022-19:01:17] [V] [TRT] Formats and tactics selection completed in 19.3197 seconds.
@zerollzeng
Copy link
Collaborator

@zhenhuaw-me ^ ^

@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Jan 3, 2023
@zerollzeng
Copy link
Collaborator

also cc @nvpohanh

@nvpohanh
Copy link
Collaborator

nvpohanh commented Jan 4, 2023

What is Myelin?

Myelin is the name of a graph compilation and execution backend which was integrated into TensorRT. Myelin provides TRT with the capability of doing aggressive pointwise op fusions and MHA fusions which are commonly used in Transformer-like models.

Is it possible to disable Myelin? If no, why not?

No because disabling Myelin would lead to poorer performance since a lot of pointwise op fusions will be disabled.

Which nodes are consumed by Myelin and which are not? Under which conditions?

The long-term goal is to have Myelin backend consume everything that is passed to TensorRT. Before that happens, TensorRT has some heuristics in deciding whether to enable this special backend. For example, if the MHA pattern exists in the network, TensorRT will try to offload everything to this backend.

this bit is super slow, when building the engine for https://huggingface.co/EleutherAI/gpt-j-6B

With GPT-j-6B, we expect the entire graph (not just the layers you circled) to be consumed by the Myelin backend. In the verbose log, it appears that the autotuning is slow, but in fact that is because Myelin is doing autotuning for ALL the layers. It is just that the details are not printed to TensorRT verbose logging yet. We are still working on this part.

In summary, it is normal that building GPT-J-6B would take ~10 mins since this is a pretty large model and TensorRT (and the Myelin backend) needs to try all types of gemm kernels to select the fastest ones. In TensorRT 8.6, we will add an option for you to skip the autotuning step, but the resulting performance may degrade.

@fxmarty
Copy link
Author

fxmarty commented Jan 4, 2023

@nvpohanh Thank you for your detailed answer! It's very clear.

Is it fair to say that Myelin is the spiritual father of torchinductor, on steroids?

@zhenhuaw-me
Copy link
Member

Myelin is a TensorRT internal component of which the behavior is not public guranteed. We don't have the plan to reveal details of Myelin currently, i.e. there will be no documentation. Just like other undocumented TensorRT behavior, please don't depend on it or make any assumption, otherwise you might see unexpected failures when upgrading TensorRT.

@zhenhuaw-me
Copy link
Member

Closing this since we don't plan to document Myelin yet. Please let me know if any further questions.

@fxmarty
Copy link
Author

fxmarty commented Jan 11, 2023

Understood, thanks for getting back on the question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants