DFLIP-3K

Welcome to DFLIP-3K, a deepfake database (DFLIP-3K) for the development of convincing and explainable deepfake detection:

✅ 3K+ generative models: DFLIP-3K provides deepfake images generated by at leasts 3K+ generative models.

✅ Inguistic footprints of these deepfakes: DFLIP-3K offers an integrated framework for the implementation of state-of-the-art detection methods.

✅ Standardized Evaluations: DFLIP-3K introduces standardized evaluation metrics and protocols to enhance the transparency and reproducibility of performance evaluations.

✅ Open database: DFLIP-3K is an open database that fosters transparency and encourages collaborative efforts to further enhance its growth.

📋 Table of Contents

Linguistic Profiling of Deepfakes: An Open Database for Next-Generation Deepfake Detection
- Features
- Quick Start

📚 Features

[Back to top]

DFLIP-3K has the following features:

⭐️ DFLIP-3K database encompasses approximately 300K deepfake samples produced from about 3K generative models.
⭐️ 190K textual prompts that are used to create images.
⭐️ Linguistic profiling in simultaneous deepfake detection, identification, and prompt prediction.

DFLIP-3K will be continuously updated to track the latest advances in deepfake.

The collection of DFLIP-3K and implementations of detection methods is an ongoing project.

You are welcome to contribute your methods and data to DFLIP-3K.

Visualization

The project page displays a limited selection of DFLIP-3K samples, comprising images and prompts. https://dflip3k.github.io/DFLIP-3K/

⏳ Quick Start

1. Download Data

[Back to top]

Please download metadata we proveded from this URL. Metadata is stored in this repository in JSON format. Upon downloading metadata, please ensure to store them in the ./datasets folder.

Once you have downloaded metadata, you can proceed with running the following line to download image:

Note that it may fail several times due to unstable network connections, but the script can be restarted and downloaded files will not be re-downloaded.

cd utils

python downloader.py --meta_file [Path to JSON file].json --save_dir [where to save iamges]

datasets
├── downloaded
│   ├── mj
│   │  ├──*.jpg
│   │  └──*.png
│   ├── sd
│   │  ├──*.jpg
│   │  └──*.png
│   ├── pd
│   ├── dalle
│   └── ...
├── pd.json
└── ...

2. Preprocessing

[Back to top]

After downloading all data, we strongly recommend that you convert all images to the same format (such as PNG in our dataset). This will facilitate reducing errors caused by different image formats during the loading process. However, this is optional.

3. Pretrained Weights

[Back to top]

Please wait a moment.

4. Training

[Back to top]

We give Otter based implementation for deepfake detection, identification and prompt prediction tasks.

We use openflamingo-9b for training.

git clone https://github.com/dflip3k/Otter

cd Otter 

accelerate launch caption_ds.py \
  --pretrained_model_name_or_path=luodian/openflamingo-9b-hf \
  --dataset_resampled --multi_instruct_path=[Path to benchmark split JSON file] \
  --run_name=aiart --batch_size=1 --num_epochs=6 \
  --cross_attn_every_n_layers=4 --lr_scheduler=cosine --learning_rate=1e-5 \
  --data_root=[Path to dataset]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
docs		docs
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model_urls.txt		model_urls.txt
read_doc.py		read_doc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DFLIP-3K

📚 Features

Visualization

⏳ Quick Start

1. Download Data

2. Preprocessing

3. Pretrained Weights

4. Training

About

Releases

Packages

Languages

License

dflip3k/DFLIP-3K

Folders and files

Latest commit

History

Repository files navigation

DFLIP-3K

📚 Features

Visualization

⏳ Quick Start

1. Download Data

2. Preprocessing

3. Pretrained Weights

4. Training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages