You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am wondering why GPT-Driver based on LLM outperforms the systems designed specifically for the motion planning task (e.g., NMP & UniAD, table I). If I understand correctly, it seems that you only provide limited information to LLM, while some important driving context is ignored (e.g., lane structure, traffic signals, etc.). Besides, when taking the visual grounding task as an example, although VLMs can detect and locate objects in the image, the accuracy can not be as good as the results obtained by object detectors. So in my personal view, LLMs/VLMs are not good at fine-grained spatial tasks at the current time point.
Can you provide some insight about the superior performance? Thanks!
The text was updated successfully, but these errors were encountered:
Thanks for your great work!
I am wondering why GPT-Driver based on LLM outperforms the systems designed specifically for the motion planning task (e.g., NMP & UniAD, table I). If I understand correctly, it seems that you only provide limited information to LLM, while some important driving context is ignored (e.g., lane structure, traffic signals, etc.). Besides, when taking the visual grounding task as an example, although VLMs can detect and locate objects in the image, the accuracy can not be as good as the results obtained by object detectors. So in my personal view, LLMs/VLMs are not good at fine-grained spatial tasks at the current time point.
Can you provide some insight about the superior performance? Thanks!
The text was updated successfully, but these errors were encountered: