New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda driver version #83
Comments
Have you tried the forward compatibility mode described here? |
please reference #46 |
@Kelsey2018 We noticed that issue, and fixed that in the internal codebase. We plan to release an update of the main branch by the end of this week or early next week. Before that, please try the changes in #46 as a workaround, thank you. |
Thanks a lot for replying! The performance of TensorRT-LLM is commendable; however, I am curious about the distinctive features that set it apart from the VLLM acceleration strategy. Both approaches employ paged attention, which I personally perceive as the equivalent of Paged KV Cache in TRT-LLM, along with dynamic batching to efficiently handle multiple requests. It remains to be seen whether there will be additional documentation in the future that elucidates the unique attributes of TensorRT-LLM? |
The most fundamental technical difference is that TensorRT-LLM relies on TensorRT ; which is a graph compiler that can produce optimised kernels for your graph. As we continue to improve TensorRT, there will be less and less needs for "manual" interventions to optimise new networks (in terms of kernels as well as taking advantage of numerical optimizations like INT4, INT8 or FP8). I hope it helps a bit. |
This issue has been fixed in the main branch, so I will close the issue and reopen it anytime if you have new questions. Thanks! |
I encountered this error, "ERROR: This container was built for NVIDIA Driver Release 535.86 or later, but version 520.56.06 was detected and compatibility mode is UNAVAILABLE."
So can TensorRT-LLM support lower driver version?
The text was updated successfully, but these errors were encountered: