Skip to content

AIS-Clemson/VisionGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Use

Copy/Download the repository, replace the API key in OPENAI_API_KEY.yaml, and run the demo.py.

Demonstration

Framework

See our another project for the movement prediction: H-Splitter

Overview

This project explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation.


With the assistance of state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered descriptions emphasizing abnormalities, assist in safe visual navigation in complex circumstances.

Framework

Moreover, our proposed framework leverages the advantages of LLMs and the open-vocabulary object detection model to achieve the dynamic scenario switch, which allows users to transition smoothly from scene to scene, which addresses the limitation of traditional visual navigation.

Furthermore, this project explored the performance contribution of different prompt components provided the vision for future improvement in visual accessibility and paved the way for LLMs in video anomaly detection and vision-language understanding.

Method

Yolo-World

We apply the latest Yolo-world for the open-world object detection task to adapt the system in any scenario any situation. The detection classes are generated by GPT-4 and can be replaced dynamically.

Detection classes

GPT-3.5

We apply GPT-3.5 for fast response and low cost. We have tested GPT-4 and GPT-4V but found them not financial-friendly.

Sample results

H-splitter

We implemented an H-splitter to assist object detection and categorize the objects into 3 different types based on the priority.

H-splitter

See our another project for more info: H-Splitter

Experiments

We use Yolo-World with the H-splitter for universal object detection. For any object that falls (a)in Area 3 or (b)in Area 1/2 with 15% of window size, we record the corresponding frame as anomalies. We set this Yolo-World-H setting as the ground truth for the benchmark.

System Sensitivity

We pre-set the system with 3 different sensitivities to report the emergency: low, normal, and high. We find that the low system sensitivity is good for daily use due to the low false alarm rate.

System Test

Detection accuracy

We compare the VisionGPT with low system sensitivity with the ground truth to evaluate its performance. We find that VisionGPT has high Accuracy and prefers less False Positive (unnecessary reports).

Performance Test

System Performance & Compatibility

System performance

Cost Evaluation

Financial cost test

Acknowledgements:

Please cite our work if you find this project helpful.

@article{wang2024visiongpt,
  title={VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation},
  author={Wang, Hao and Qin, Jiayou and Bastola, Ashish and Chen, Xiwen and Suchanek, John and Gong, Zihao and Razi, Abolfazl},
  journal={arXiv preprint arXiv:2403.12415},
  year={2024}
}

About

LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages