New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Documentation] Detail when Myelin is used and why #2576
Comments
@zhenhuaw-me ^ ^ |
also cc @nvpohanh |
Myelin is the name of a graph compilation and execution backend which was integrated into TensorRT. Myelin provides TRT with the capability of doing aggressive pointwise op fusions and MHA fusions which are commonly used in Transformer-like models.
No because disabling Myelin would lead to poorer performance since a lot of pointwise op fusions will be disabled.
The long-term goal is to have Myelin backend consume everything that is passed to TensorRT. Before that happens, TensorRT has some heuristics in deciding whether to enable this special backend. For example, if the MHA pattern exists in the network, TensorRT will try to offload everything to this backend.
With GPT-j-6B, we expect the entire graph (not just the layers you circled) to be consumed by the Myelin backend. In the verbose log, it appears that the autotuning is slow, but in fact that is because Myelin is doing autotuning for ALL the layers. It is just that the details are not printed to TensorRT verbose logging yet. We are still working on this part. In summary, it is normal that building GPT-J-6B would take ~10 mins since this is a pretty large model and TensorRT (and the Myelin backend) needs to try all types of gemm kernels to select the fastest ones. In TensorRT 8.6, we will add an option for you to skip the autotuning step, but the resulting performance may degrade. |
@nvpohanh Thank you for your detailed answer! It's very clear. Is it fair to say that Myelin is the spiritual father of torchinductor, on steroids? |
Myelin is a TensorRT internal component of which the behavior is not public guranteed. We don't have the plan to reveal details of Myelin currently, i.e. there will be no documentation. Just like other undocumented TensorRT behavior, please don't depend on it or make any assumption, otherwise you might see unexpected failures when upgrading TensorRT. |
Closing this since we don't plan to document Myelin yet. Please let me know if any further questions. |
Understood, thanks for getting back on the question! |
Description
Following: #2308 (comment)
Searching "Myelin" in https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html and https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/index.html yields no result.
The doc should answer questions as:
Related: huggingface/optimum#605
Examples where a documentation would help
My log (this bit is super slow, when building the engine for https://huggingface.co/EleutherAI/gpt-j-6B ):
Which likely corresponds to the following part of my graph:
The user can try and guess why Myelin is used, but since there is no doc, it is hard.
For the model https://huggingface.co/anton-l/gpt-j-tiny-random , I have an issue at a different part of the model, still involving a Cast:
Again, it's unclear what's wrong.
Yet an other one (after using
onnxsim --skip-constant-folding
on https://huggingface.co/anton-l/gpt-j-tiny-random):The text was updated successfully, but these errors were encountered: