🚗 NexusAD

Exploring the Nexus for Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

⚠️ Note: The code is currently being updated, stay tuned for more features and improvements.

ECCV 2024 Autonomous Driving Workshop Corner Case Scene Understanding Leaderboard
W-CODA 2024 Challenge Track 1

✍️ Authors

Mengjingcheng Mo, Jingxin Wang, Like Wang, Haosheng Chen, Changjun Gu, Jiaxu Leng, Xinbo Gao
Chongqing University of Posts and Telecommunications

🌟 Project Highlights

🔥 NexusAD introduces a multimodal perception and understanding framework based on InternVL-2.0, significantly improving detection, depth estimation, and reasoning abilities for complex scenarios through fine-tuning on the CODA-LM dataset.
🏁 NexusAD participated in the ECCV 2024 Autonomous Driving Workshop, focusing on multimodal scene understanding tasks in extreme driving scenarios.
Also participated in the W-CODA 2024 Challenge Track 1.

📰 Latest News

2024/08/15: NexusAD was submitted to ECCV 2024 and achieved a score of 68.97.
2024/08/15: The NexusAD team released the latest version of the code and LoRA weights.

🚀 Quick Start

Follow these steps to start using NexusAD:

Clone the repository:

git clone https://github.com/OpenVisualLab/NexusAD.git
cd NexusAD

Install dependencies:
```
pip install -r requirements.txt
```
Download the CODA-LM Dataset and place it in the specified directory.
Download the LoRA Weights and place them in the weights/ directory.

Run the model:

python preprocess.py --data_path <path-to-CODA-LM>
python train.py --config config.json
python evaluate.py --data_path <path-to-evaluation-set>

⚙️ Model Architecture

The NexusAD model architecture consists of the following components:

Preliminary Visual Perception: Uses Grounding DINO for object detection and DepthAnything v2 for depth estimation, transforming spatial information into easily understandable structured text.
Scene-aware Enhanced Retrieval Generation: Utilizes Retrieval-Augmented Generation (RAG) to retrieve and select relevant samples, enhancing understanding of complex driving scenarios.
Driving Prompt Optimization: Uses Chain-of-Thought (CoT) prompting to generate context-aware, structured driving suggestions.
Fine-tuning: Efficient parameter fine-tuning is performed using LoRA to optimize performance while saving computational resources.

📊 Experimental Results

In the ECCV 2024 Corner Case Understanding task, NexusAD outperformed baseline models, achieving a final score of 68.97:

Model	General Perception	Regional Perception	Driving Suggestions	Final Score
GPT-4V	57.50	56.26	63.30	59.02
CODA-VLM	55.04	77.68	58.14	63.62
InternVL-2.0-26B	43.39	64.91	48.04	52.11
NexusAD (Ours)	57.58	84.31	65.02	68.97

💡 Contribution Guidelines

We welcome all forms of contributions! Please refer to CONTRIBUTING.md for details on how to participate.

📜 License & Citation

This project is licensed under the MIT License. If you find this project helpful in your research, please cite it as follows:

@article{mo2024nexusad,
  title={NexusAD: Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving},
  author={Mo, Mengjingcheng and Wang, Jingxin and Wang, Like and Chen, Haosheng and Gu, Changjun and Leng, Jiaxu and Gao, Xinbo},
  journal={ECCV 2024 Autonomous Driving Workshop},
  year={2024}
}

🙏 Acknowledgments

Special thanks to the following projects for providing key references and support for the development of NexusAD:

InternVL: Provided crucial technical support for the development of multimodal vision-language models.
CODA-LM: Provided datasets and resources for the corner case understanding task.

(Back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
3.2_Preliminary_Visual_Perception		3.2_Preliminary_Visual_Perception
3.3_Scene-aware_Retrieval-Augmented_Generation		3.3_Scene-aware_Retrieval-Augmented_Generation
3.4_Driving_Prompt_Optimization		3.4_Driving_Prompt_Optimization
4.1_Fine-tuning		4.1_Fine-tuning
4.1_Inference_Process		4.1_Inference_Process
[250113]Unorganized		[250113]Unorganized
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚗 NexusAD

✍️ Authors

🌟 Project Highlights

📰 Latest News

🚀 Quick Start

⚙️ Model Architecture

📊 Experimental Results

💡 Contribution Guidelines

📜 License & Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

OpenVisualLab/NexusAD

Folders and files

Latest commit

History

Repository files navigation

🚗 NexusAD

✍️ Authors

🌟 Project Highlights

📰 Latest News

🚀 Quick Start

⚙️ Model Architecture

📊 Experimental Results

💡 Contribution Guidelines

📜 License & Citation

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages