[Documentation] Detail when Myelin is used and why #2576

fxmarty · 2022-12-30T18:14:03Z

Description

Searching "Myelin" in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html and https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/index.html yields no result.

The doc should answer questions as:

What is Myelin?
Is it possible to disable Myelin? If no, why not?
Which nodes are consumed by Myelin and which are not? Under which conditions?

Related: huggingface/optimum#605

Examples where a documentation would help

My log (this bit is super slow, when building the engine for https://huggingface.co/EleutherAI/gpt-j-6B ):

[12/30/2022-15:39:09] [V] [TRT] --------------- Timing Runner: {ForeignNode[transformer.h.0.attn.bias.../Cast]} (Myelin)
[12/30/2022-15:48:29] [V] [TRT] Tactic: 0x0000000000000000 Time: 17.8375
[12/30/2022-15:48:30] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 17.8375
[12/30/2022-15:48:30] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[12/30/2022-15:48:30] [V] [TRT] Formats and tactics selection completed in 567.208 seconds.
...
[12/30/2022-15:48:47] [V] [TRT] Engine generation completed in 584.689 seconds.

Which likely corresponds to the following part of my graph:

The user can try and guess why Myelin is used, but since there is no doc, it is hard.

For the model https://huggingface.co/anton-l/gpt-j-tiny-random , I have an issue at a different part of the model, still involving a Cast:

[12/30/2022-17:37:33] [V] [TRT] --------------- Timing Runner: {ForeignNode[transformer.h.0.ln_1.bias.../Cast]} (Myelin)
[12/30/2022-17:37:56] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.23026
[12/30/2022-17:37:56] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 1.23026
[12/30/2022-17:37:56] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[12/30/2022-17:37:56] [V] [TRT] Formats and tactics selection completed in 28.5682 seconds.
...
[12/30/2022-17:37:56] [V] [TRT] Engine generation completed in 29.8719 seconds.

Again, it's unclear what's wrong.

Yet an other one (after using onnxsim --skip-constant-folding on https://huggingface.co/anton-l/gpt-j-tiny-random):

[12/30/2022-19:01:01] [V] [TRT] --------------- Timing Runner: {ForeignNode[(Unnamed Layer* 5216) [Shuffle].../lm_head/Add]} (Myelin)
[12/30/2022-19:01:17] [V] [TRT] Tactic: 0x0000000000000000 Time: 1.39995
[12/30/2022-19:01:17] [V] [TRT] Fastest Tactic: 0x0000000000000000 Time: 1.39995
[12/30/2022-19:01:17] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Myelin Tactic: 0x0000000000000000
[12/30/2022-19:01:17] [V] [TRT] Formats and tactics selection completed in 19.3197 seconds.

The text was updated successfully, but these errors were encountered:

zerollzeng · 2023-01-03T16:09:43Z

@zhenhuaw-me ^ ^

zerollzeng · 2023-01-03T16:10:07Z

also cc @nvpohanh

nvpohanh · 2023-01-04T02:29:26Z

What is Myelin?

Myelin is the name of a graph compilation and execution backend which was integrated into TensorRT. Myelin provides TRT with the capability of doing aggressive pointwise op fusions and MHA fusions which are commonly used in Transformer-like models.

Is it possible to disable Myelin? If no, why not?

No because disabling Myelin would lead to poorer performance since a lot of pointwise op fusions will be disabled.

Which nodes are consumed by Myelin and which are not? Under which conditions?

The long-term goal is to have Myelin backend consume everything that is passed to TensorRT. Before that happens, TensorRT has some heuristics in deciding whether to enable this special backend. For example, if the MHA pattern exists in the network, TensorRT will try to offload everything to this backend.

this bit is super slow, when building the engine for https://huggingface.co/EleutherAI/gpt-j-6B

With GPT-j-6B, we expect the entire graph (not just the layers you circled) to be consumed by the Myelin backend. In the verbose log, it appears that the autotuning is slow, but in fact that is because Myelin is doing autotuning for ALL the layers. It is just that the details are not printed to TensorRT verbose logging yet. We are still working on this part.

In summary, it is normal that building GPT-J-6B would take ~10 mins since this is a pretty large model and TensorRT (and the Myelin backend) needs to try all types of gemm kernels to select the fastest ones. In TensorRT 8.6, we will add an option for you to skip the autotuning step, but the resulting performance may degrade.

fxmarty · 2023-01-04T08:37:50Z

@nvpohanh Thank you for your detailed answer! It's very clear.

Is it fair to say that Myelin is the spiritual father of torchinductor, on steroids?

zhenhuaw-me · 2023-01-11T10:21:44Z

Myelin is a TensorRT internal component of which the behavior is not public guranteed. We don't have the plan to reveal details of Myelin currently, i.e. there will be no documentation. Just like other undocumented TensorRT behavior, please don't depend on it or make any assumption, otherwise you might see unexpected failures when upgrading TensorRT.

zhenhuaw-me · 2023-01-11T10:22:39Z

Closing this since we don't plan to document Myelin yet. Please let me know if any further questions.

fxmarty · 2023-01-11T16:08:17Z

Understood, thanks for getting back on the question!

fxmarty mentioned this issue Dec 30, 2022

Myelin ForeignNode takes most time at engine build & inference #2577

Closed

zerollzeng assigned zhenhuaw-me Jan 3, 2023

zerollzeng added the triaged Issue has been triaged by maintainers label Jan 3, 2023

zhenhuaw-me closed this as completed Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Documentation] Detail when Myelin is used and why #2576

[Documentation] Detail when Myelin is used and why #2576

fxmarty commented Dec 30, 2022 •

edited

zerollzeng commented Jan 3, 2023

zerollzeng commented Jan 3, 2023

nvpohanh commented Jan 4, 2023

fxmarty commented Jan 4, 2023 •

edited

zhenhuaw-me commented Jan 11, 2023

zhenhuaw-me commented Jan 11, 2023

fxmarty commented Jan 11, 2023

[Documentation] Detail when Myelin is used and why #2576

[Documentation] Detail when Myelin is used and why #2576

Comments

fxmarty commented Dec 30, 2022 • edited

Description

Examples where a documentation would help

zerollzeng commented Jan 3, 2023

zerollzeng commented Jan 3, 2023

nvpohanh commented Jan 4, 2023

fxmarty commented Jan 4, 2023 • edited

zhenhuaw-me commented Jan 11, 2023

zhenhuaw-me commented Jan 11, 2023

fxmarty commented Jan 11, 2023

fxmarty commented Dec 30, 2022 •

edited

fxmarty commented Jan 4, 2023 •

edited