Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the problem formulation and evaluation results #7

Open
MasterIzumi opened this issue Apr 19, 2024 · 0 comments
Open

About the problem formulation and evaluation results #7

MasterIzumi opened this issue Apr 19, 2024 · 0 comments

Comments

@MasterIzumi
Copy link

Thanks for your great work!

I am wondering why GPT-Driver based on LLM outperforms the systems designed specifically for the motion planning task (e.g., NMP & UniAD, table I). If I understand correctly, it seems that you only provide limited information to LLM, while some important driving context is ignored (e.g., lane structure, traffic signals, etc.). Besides, when taking the visual grounding task as an example, although VLMs can detect and locate objects in the image, the accuracy can not be as good as the results obtained by object detectors. So in my personal view, LLMs/VLMs are not good at fine-grained spatial tasks at the current time point.

Can you provide some insight about the superior performance? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant