Synthforensics

SynthForensics: Benchmarking and Evaluating People-Centric Synthetic Video Deepfakes

Abstract

Modern T2V/I2V generators synthesize people increasingly hard to distinguish from authentic footage, while current evaluation suites lag: legacy benchmarks target manipulation-based forgeries, and recent synthetic-video benchmarks prioritize scale over realistic human depiction. We introduce SynthForensics, a people-centric benchmark of 20,445 videos from 8 T2V and 7 I2V open-source generators, paired-source from FF++/DFD reals, two-stage human-validated, in four compression versions with full metadata. In our paired-comparison human study, raters prefer SynthForensics in 71–77% of head-to-head comparisons against each of nine existing synthetic-video benchmarks, while facial-quality metrics fall within the FF++/DFD baseline range. Across 15 detectors and three protocols, face-based methods drop 13–55 AUC points (mean 27) from FF++ to SynthForensics and a further 23 under aggressive compression; fine-tuning closes the gap at a backward cost on legacy benchmarks; re-training shows synthetic and manipulation features largely disjoint for most detectors. We release dataset, pipeline, and code.

Dataset Structure

SynthForensics/
├── T2V/
│   ├── videos/
│   │   ├── raw/
│   │   │   ├── cogvideox/           # <ID>_cogvideox_t2v.mp4
│   │   │   ├── daVinci-MagiHuman/
│   │   │   ├── helios/
│   │   │   ├── ltx2-3/
│   │   │   ├── magi-1/
│   │   │   ├── self-forcing/
│   │   │   ├── skyreels-v2/
│   │   │   └── wan2-1/
│   │   ├── canonical/               # same per-generator structure
│   │   ├── crf23/
│   │   └── crf40/
│   └── metadata/
│       ├── cogvideox/               # <ID>_cogvideox_t2v.json
│       ├── daVinci-MagiHuman/
│       └── …                        # one sub-folder per generator
├── I2V/
│   ├── videos/
│   │   ├── raw/
│   │   │   ├── cogvideox/           # <ID>_cogvideox_i2v.mp4
│   │   │   ├── daVinci-MagiHuman/
│   │   │   ├── helios/
│   │   │   ├── ltx2-3/
│   │   │   ├── magi-1/
│   │   │   ├── skyreels-v2/
│   │   │   └── wan2-1/
│   │   ├── canonical/               # same per-generator structure
│   │   ├── crf23/
│   │   └── crf40/
│   ├── i2v_frames/                  # <ID>.png — reference frames used as conditioning input
│   └── metadata/
│       ├── cogvideox/               # <ID>_cogvideox_i2v.json
│       └── …                        # one sub-folder per generator
├── captions/                        # <ID>.json — dense captions for FF++ and DFD source videos
├── train.json
├── test.json
├── val.json
└── README.md

Within both T2V/videos/ and I2V/videos/, samples are organized by compression level (raw, canonical, crf23, crf40) and, within each compression level, by generator name. Two distinct ID schemes are used depending on the source:

FF++ samples — <ID>_<generator>_t2v.mp4 / <ID>_<generator>_i2v.mp4, where <ID> is a zero-padded three-digit integer inherited from the FaceForensics++ dataset (e.g., 071_cogvideox_t2v.mp4).
DFD samples — <subject_id>__<scene>_<generator>_t2v.mp4 / <subject_id>__<scene>_<generator>_i2v.mp4, where <subject_id> is a two-digit zero-padded integer and <scene> is a descriptive scene name (e.g., 01__exit_phone_room_cogvideox_t2v.mp4).

In both cases <generator> matches the directory name (e.g., cogvideox, daVinci-MagiHuman, wan2-1). Metadata files in T2V/metadata/<generator>/ and I2V/metadata/<generator>/ follow the same naming patterns with a .json extension.

Dataset Splits

The files train.json, test.json, and val.json each contain a list of video identifiers (zero-padded three-digit strings, e.g., "071", "954") that define the official training, test, and validation partitions of the benchmark. These identifiers are inherited directly from the FaceForensics++ dataset splits, ensuring full compatibility with the FF++ evaluation protocol.

The identifiers serve a dual purpose:

Fake video selection. For each generator, only the videos whose numeric ID appears in the corresponding split file should be included in that partition. Concretely, given a split set $\mathcal{S}$ and a generator $g$, the subset of fake videos assigned to that partition is:

$$\mathcal{F}_{g,\mathcal{S}} = {, \texttt{_.mp4} \mid \texttt{ID} \in \mathcal{S} ,}$$

This selection applies uniformly across all generators in both the T2V and I2V branches, at every available compression level.

Real video selection. The same identifiers correspond to the real (pristine) videos from the FaceForensics++ dataset that should be treated as the authentic counterpart for each partition. Detectors trained or evaluated on SynthForensics are therefore expected to use the FF++ real videos indexed by the same IDs as the negative class, preserving the one-to-one correspondence between real and fake samples established by the original FF++ benchmark.

DeepFakeDetection (DFD) Test Videos

The test partition is additionally supplemented with the full DeepFakeDetection (DFD) dataset. Unlike the SynthForensics generators — whose test samples are selected via the ID-based mechanism described above — all DFD videos are included in the test split without any ID-based filtering. DFD videos follow the naming convention <subject_id>__<scene>.mp4 (e.g., 01__exit_phone_room.mp4) and are drawn from 16 distinct scenarios across multiple subjects. These samples serve as an out-of-domain evaluation source, enabling assessment of detector generalization beyond the FF++-aligned fake distribution.

Generators

Branch	Display name	Directory name	Videos (raw)
T2V	CogVideoX	`cogvideox`	1,363
T2V	DaVinci-MagiHuman	`daVinci-MagiHuman`	1,363
T2V	Helios	`helios`	1,363
T2V	LTX-2.3	`ltx2-3`	1,363
T2V	Magi-1	`magi-1`	1,363
T2V	Self-Forcing	`self-forcing`	1,363
T2V	SkyReels-V2	`skyreels-v2`	1,363
T2V	Wan2.1	`wan2-1`	1,363
I2V	CogVideoX	`cogvideox`	1,363
I2V	DaVinci-MagiHuman	`daVinci-MagiHuman`	1,363
I2V	Helios	`helios`	1,363
I2V	LTX-2.3	`ltx2-3`	1,363
I2V	Magi-1	`magi-1`	1,363
I2V	SkyReels-V2	`skyreels-v2`	1,363
I2V	Wan2.1	`wan2-1`	1,363
Total (raw)	15 T2V+I2V generators		20,445
Total (all compressions)	15 generators × 4 compression levels		81,780

Overall Statistics

Metric	Value
Unique Synthetic Videos (T2V)	10,904
Unique Synthetic Videos (I2V)	9,541
Total Unique Synthetic Videos	20,445
Total Video Files (4 compressions)	81,780
Total Unique Frames	1,934,097
Total Unique Video Duration	~27.2 hours
Landscape Videos	16,349
Portrait Videos	4,096
Resolution Range (W×H)	640×384 – 1920×1088
Frame Rate Range (FPS)	8 – 25
Duration Range (s)	4 – 6

Resolutions

Resolutions are reported for the raw (uncompressed) videos; compressed versions preserve the same dimensions. Orientation: L = landscape (W > H), P = portrait (H > W).

Branch	Generator	Resolution (W×H)	Orient.	Count (raw)
T2V	CogVideoX	720×480	L	1,363
T2V	DaVinci-MagiHuman	1920×1088	L	667
T2V	DaVinci-MagiHuman	1088×1920	P	696
T2V	Helios	640×384	L	1,363
T2V	LTX-2.3	1536×1024	L	703
T2V	LTX-2.3	1024×1536	P	660
T2V	Magi-1	1280×720	L	665
T2V	Magi-1	720×1280	P	698
T2V	Self-Forcing	832×480	L	664
T2V	Self-Forcing	480×832	P	699
T2V	SkyReels-V2	960×544	L	702
T2V	SkyReels-V2	544×960	P	661
T2V	Wan2.1	832×480	L	689
T2V	Wan2.1	480×832	P	674
I2V	CogVideoX	720×480	L	1,363
I2V	DaVinci-MagiHuman	1920×1088	L	1,361
I2V	DaVinci-MagiHuman	1088×1920	P	2
I2V	Helios	640×384	L	1,363
I2V	LTX-2.3	1536×1024	L	1,361
I2V	LTX-2.3	1024×1536	P	2
I2V	Magi-1	1280×720	L	1,363
I2V	SkyReels-V2	960×544	L	1,361
I2V	SkyReels-V2	544×960	P	2
I2V	Wan2.1	832×464	L	917
I2V	Wan2.1	720×544	L	273
I2V	Wan2.1	736×528	L	89
I2V	Wan2.1	704×560	L	51
I2V	Wan2.1	768×512	L	28
I2V	Wan2.1	800×480	L	1
I2V	Wan2.1	816×480	L	1
I2V	Wan2.1	688×560	L	1
I2V	Wan2.1	464×832	P	1
I2V	Wan2.1	608×640	P	1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly