CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection

Official PyTorch implementation of "CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection" (ICPR 2026) by Hyeongmuk Lim and Youngbum Hur.

Abstract: Existing Video Anomaly Detection (VAD) methods typically rely on task-specific training, leading to strong domain dependency and high training costs. Moreover, most existing methods output only scalar anomaly scores, providing limited insight into why specific events are considered abnormal.Recent advances in Vision– Language Models (VLMs) have enabled both anomaly detection and human-interpretable reasoning. However, many VLM-based approaches still require additional training steps (e.g., instruction tuning or verbalized learning) or external Large Language Models (LLMs), incurring further training costs and inference overhead. To address these challenges, we propose CoReVAD, a contextual reasoning framework for training-free video anomaly detection that operates with a single frozen VLM. CoReVAD directly generates anomaly scores and temporal descriptions from the VLM. To mitigate noise in generative outputs, we introduce a Local Response Cleaning (LRC) module based on local vision– text alignment. Furthermore, global temporal context and progression are incorporated through softmax-based refinement, Gaussian smoothing, and position weighting. Experiments on UCF-Crime and XD-Violence demonstrate that CoReVAD achieves competitive performance among training-free methods while providing reliable and interpretable explanations.

Data

For datasets, Please download the original videos from links (GT of each datasets is already included).

UCF-Crime: link
XD-Violence: link

The test video directory structure is as follows:

UCF-Crime
    └── videos
          ├── Abuse028_x264.mp4
          ├── Abuse030_x264.mp4
          └── ...
XD-Violence
    └── videos
          ├── A.Beautiful.Mind.2001__00-25-20_00-29-20_label_A.mp4
          ├── A.Beautiful.Mind.2001__00-40-52_00-42-01_label_A.mp4
          └── ...

Setup

Clone the repo

git clone https://github.com/Muk-00/CoReVAD.git
cd CoReVAD
conda create --name CoReVAD python=3.9
conda activate CoReVAD
pip install -r requirements.txt

Install the environment

In this paper, we use InternVL2, we follow the official installation instructions provided by InternVL2 (link)

Extract CLIP features

We first CLIP vision features from the dataset.

For UCF-Crime:
cd src/ucf
python extract_clip_features.py

For XD-Violence:
cd src/xd
python extract_clip_features.py

Output (UCF-Crime):

src/ucf/CLIP_feats
    └── ucf_test
          ├── Abuse028_x264_CLIP_features.npy
          ├── Abuse030_x264_CLIP_features.npy
          └── ...

Output (XD-Violence):

src/xd/CLIP_feats
    └── xd_test
          ├── A.Beautiful.Mind.2001__00-25-20_00-29-20_label_A_CLIP_features.npy
          ├── A.Beautiful.Mind.2001__00-40-52_00-42-01_label_A_CLIP_features.npy
          └── ...

Inference

1. Generate VLM response

The VLM responses are obtained in JSON format.

python generate_VLM_response.py

2. Local Response Cleaning (LRC)

The results of Local Response Cleaning (LRC) are saved in JSON format. We provide the generated responses in VLM_responses_LRC_ucf.json for UCF-Crime and VLM_responses_LRC_xd.json for XD-Violence.

python LRC.py

3. Evaluation

Evaluation for UCF-Crime dataset

eval_ucf.py

Evaluation for XD-Violence dataset

eval_xd.py

Acknowledgements

Thanks to Ye et al. for sharing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
UCF-Crime/Anomaly_Detection_splits		UCF-Crime/Anomaly_Detection_splits
XD-Violence		XD-Violence
image		image
src		src
utils		utils
.gitignore		.gitignore
P_VLM_format.txt		P_VLM_format.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection

Data

Setup

Clone the repo

Install the environment

Extract CLIP features

Inference

1. Generate VLM response

2. Local Response Cleaning (LRC)

3. Evaluation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection

Data

Setup

Clone the repo

Install the environment

Extract CLIP features

Inference

1. Generate VLM response

2. Local Response Cleaning (LRC)

3. Evaluation

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages