Skip to content
View Synthforensics's full-sized avatar
  • Joined May 5, 2026

Block or report Synthforensics

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Synthforensics/README.md

SynthForensics: Benchmarking and Evaluating People-Centric Synthetic Video Deepfakes

Abstract

Modern T2V/I2V generators synthesize people increasingly hard to distinguish from authentic footage, while current evaluation suites lag: legacy benchmarks target manipulation-based forgeries, and recent synthetic-video benchmarks prioritize scale over realistic human depiction. We introduce SynthForensics, a people-centric benchmark of 20,445 videos from 8 T2V and 7 I2V open-source generators, paired-source from FF++/DFD reals, two-stage human-validated, in four compression versions with full metadata. In our paired-comparison human study, raters prefer SynthForensics in 71–77% of head-to-head comparisons against each of nine existing synthetic-video benchmarks, while facial-quality metrics fall within the FF++/DFD baseline range. Across 15 detectors and three protocols, face-based methods drop 13–55 AUC points (mean 27) from FF++ to SynthForensics and a further 23 under aggressive compression; fine-tuning closes the gap at a backward cost on legacy benchmarks; re-training shows synthetic and manipulation features largely disjoint for most detectors. We release dataset, pipeline, and code.


Dataset Structure

SynthForensics/
├── T2V/
│   ├── videos/
│   │   ├── raw/
│   │   │   ├── cogvideox/           # <ID>_cogvideox_t2v.mp4
│   │   │   ├── daVinci-MagiHuman/
│   │   │   ├── helios/
│   │   │   ├── ltx2-3/
│   │   │   ├── magi-1/
│   │   │   ├── self-forcing/
│   │   │   ├── skyreels-v2/
│   │   │   └── wan2-1/
│   │   ├── canonical/               # same per-generator structure
│   │   ├── crf23/
│   │   └── crf40/
│   └── metadata/
│       ├── cogvideox/               # <ID>_cogvideox_t2v.json
│       ├── daVinci-MagiHuman/
│       └── …                        # one sub-folder per generator
├── I2V/
│   ├── videos/
│   │   ├── raw/
│   │   │   ├── cogvideox/           # <ID>_cogvideox_i2v.mp4
│   │   │   ├── daVinci-MagiHuman/
│   │   │   ├── helios/
│   │   │   ├── ltx2-3/
│   │   │   ├── magi-1/
│   │   │   ├── skyreels-v2/
│   │   │   └── wan2-1/
│   │   ├── canonical/               # same per-generator structure
│   │   ├── crf23/
│   │   └── crf40/
│   ├── i2v_frames/                  # <ID>.png — reference frames used as conditioning input
│   └── metadata/
│       ├── cogvideox/               # <ID>_cogvideox_i2v.json
│       └── …                        # one sub-folder per generator
├── captions/                        # <ID>.json — dense captions for FF++ and DFD source videos
├── train.json
├── test.json
├── val.json
└── README.md

Within both T2V/videos/ and I2V/videos/, samples are organized by compression level (raw, canonical, crf23, crf40) and, within each compression level, by generator name. Two distinct ID schemes are used depending on the source:

  • FF++ samples<ID>_<generator>_t2v.mp4 / <ID>_<generator>_i2v.mp4, where <ID> is a zero-padded three-digit integer inherited from the FaceForensics++ dataset (e.g., 071_cogvideox_t2v.mp4).
  • DFD samples<subject_id>__<scene>_<generator>_t2v.mp4 / <subject_id>__<scene>_<generator>_i2v.mp4, where <subject_id> is a two-digit zero-padded integer and <scene> is a descriptive scene name (e.g., 01__exit_phone_room_cogvideox_t2v.mp4).

In both cases <generator> matches the directory name (e.g., cogvideox, daVinci-MagiHuman, wan2-1). Metadata files in T2V/metadata/<generator>/ and I2V/metadata/<generator>/ follow the same naming patterns with a .json extension.


Dataset Splits

The files train.json, test.json, and val.json each contain a list of video identifiers (zero-padded three-digit strings, e.g., "071", "954") that define the official training, test, and validation partitions of the benchmark. These identifiers are inherited directly from the FaceForensics++ dataset splits, ensuring full compatibility with the FF++ evaluation protocol.

The identifiers serve a dual purpose:

  1. Fake video selection. For each generator, only the videos whose numeric ID appears in the corresponding split file should be included in that partition. Concretely, given a split set $\mathcal{S}$ and a generator $g$, the subset of fake videos assigned to that partition is:

$$\mathcal{F}_{g,\mathcal{S}} = {, \texttt{_.mp4} \mid \texttt{ID} \in \mathcal{S} ,}$$

This selection applies uniformly across all generators in both the T2V and I2V branches, at every available compression level.

  1. Real video selection. The same identifiers correspond to the real (pristine) videos from the FaceForensics++ dataset that should be treated as the authentic counterpart for each partition. Detectors trained or evaluated on SynthForensics are therefore expected to use the FF++ real videos indexed by the same IDs as the negative class, preserving the one-to-one correspondence between real and fake samples established by the original FF++ benchmark.

DeepFakeDetection (DFD) Test Videos

The test partition is additionally supplemented with the full DeepFakeDetection (DFD) dataset. Unlike the SynthForensics generators — whose test samples are selected via the ID-based mechanism described above — all DFD videos are included in the test split without any ID-based filtering. DFD videos follow the naming convention <subject_id>__<scene>.mp4 (e.g., 01__exit_phone_room.mp4) and are drawn from 16 distinct scenarios across multiple subjects. These samples serve as an out-of-domain evaluation source, enabling assessment of detector generalization beyond the FF++-aligned fake distribution.


Generators

Branch Display name Directory name Videos (raw)
T2V CogVideoX cogvideox 1,363
T2V DaVinci-MagiHuman daVinci-MagiHuman 1,363
T2V Helios helios 1,363
T2V LTX-2.3 ltx2-3 1,363
T2V Magi-1 magi-1 1,363
T2V Self-Forcing self-forcing 1,363
T2V SkyReels-V2 skyreels-v2 1,363
T2V Wan2.1 wan2-1 1,363
I2V CogVideoX cogvideox 1,363
I2V DaVinci-MagiHuman daVinci-MagiHuman 1,363
I2V Helios helios 1,363
I2V LTX-2.3 ltx2-3 1,363
I2V Magi-1 magi-1 1,363
I2V SkyReels-V2 skyreels-v2 1,363
I2V Wan2.1 wan2-1 1,363
Total (raw) 15 T2V+I2V generators 20,445
Total (all compressions) 15 generators × 4 compression levels 81,780

Overall Statistics

Metric Value
Unique Synthetic Videos (T2V) 10,904
Unique Synthetic Videos (I2V) 9,541
Total Unique Synthetic Videos 20,445
Total Video Files (4 compressions) 81,780
Total Unique Frames 1,934,097
Total Unique Video Duration ~27.2 hours
Landscape Videos 16,349
Portrait Videos 4,096
Resolution Range (W×H) 640×384 – 1920×1088
Frame Rate Range (FPS) 8 – 25
Duration Range (s) 4 – 6

Resolutions

Resolutions are reported for the raw (uncompressed) videos; compressed versions preserve the same dimensions. Orientation: L = landscape (W > H), P = portrait (H > W).

Branch Generator Resolution (W×H) Orient. Count (raw)
T2V CogVideoX 720×480 L 1,363
T2V DaVinci-MagiHuman 1920×1088 L 667
T2V DaVinci-MagiHuman 1088×1920 P 696
T2V Helios 640×384 L 1,363
T2V LTX-2.3 1536×1024 L 703
T2V LTX-2.3 1024×1536 P 660
T2V Magi-1 1280×720 L 665
T2V Magi-1 720×1280 P 698
T2V Self-Forcing 832×480 L 664
T2V Self-Forcing 480×832 P 699
T2V SkyReels-V2 960×544 L 702
T2V SkyReels-V2 544×960 P 661
T2V Wan2.1 832×480 L 689
T2V Wan2.1 480×832 P 674
I2V CogVideoX 720×480 L 1,363
I2V DaVinci-MagiHuman 1920×1088 L 1,361
I2V DaVinci-MagiHuman 1088×1920 P 2
I2V Helios 640×384 L 1,363
I2V LTX-2.3 1536×1024 L 1,361
I2V LTX-2.3 1024×1536 P 2
I2V Magi-1 1280×720 L 1,363
I2V SkyReels-V2 960×544 L 1,361
I2V SkyReels-V2 544×960 P 2
I2V Wan2.1 832×464 L 917
I2V Wan2.1 720×544 L 273
I2V Wan2.1 736×528 L 89
I2V Wan2.1 704×560 L 51
I2V Wan2.1 768×512 L 28
I2V Wan2.1 800×480 L 1
I2V Wan2.1 816×480 L 1
I2V Wan2.1 688×560 L 1
I2V Wan2.1 464×832 P 1
I2V Wan2.1 608×640 P 1

Popular repositories Loading

  1. Synthforensics Synthforensics Public