This repository provides the implementation for Zero-Shot Fake Video Detection by Audio-Visual Consistency, a content-consistency based method for detecting fake videos. Our approach leverages the FakeAVCeleb and DFDC datasets and builds upon the auto-avsr framework.
To get started, you'll need to prepare your datasets:
-
Download Datasets:
-
Pre-processing: Our pre-processing pipeline, adapted from
auto-avsr
, ensures consistent data formatting:- Videos are converted to 25 FPS.
- Audio is converted to 16 kHz mono.
- The speaker's lip region is detected and cropped from each video frame.
- Cropped frames are resized to a uniform 96x96 pixels.
- For a detailed look at the complete pre-processing steps, refer to the auto-avsr preparation guide.
-
Create CSV File List: After pre-processing, create a CSV file (e.g.,
data/your_dataset.csv
) with the following format:absolute_video_file_path, video_frames, segment_label, audio_label, video_label
Example:
/your_path/FakeAVCeleb/video/FakeVideo-FakeAudio/African/men/id00366/00118_id00076_Isiq7cA-DNE_faceswap_id01170_wavtolip.mp4, 148, 0, 0, 0
For convenience, we've already prepared file lists for
FakeAVCeleb
andDFDC
in thedata
folder.
Follow these steps to set up your environment and download necessary models:
-
Create Environment:
conda create -y -n fakevideodetection python=3.10 conda activate fakevideodetection pip install -r requirements.txt
-
Download Pre-trained Models: Download the following pre-trained models from VSR for ML and place them in the
pretrained_model
folder:Component WER URL Size (MB) Visual-only 19.1 GoogleDrive or BaiduDrive (key: dqsy) 891 Audio-only 1.0 GoogleDrive or BaiduDrive (key: dvf2) 860
Once everything is set up, you can run the inference:
Configure your settings in run.sh
and then execute:
bash run.sh