Skip to content

dflip3k/DFLIP-3K

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DFLIP-3K

License: CC BY-NC 4.0 Release .10 PyTorch Python


Paper Data
Methods

Welcome to DFLIP-3K, a deepfake database (DFLIP-3K) for the development of convincing and explainable deepfake detection:

3K+ generative models: DFLIP-3K provides deepfake images generated by at leasts 3K+ generative models.

Inguistic footprints of these deepfakes: DFLIP-3K offers an integrated framework for the implementation of state-of-the-art detection methods.

Standardized Evaluations: DFLIP-3K introduces standardized evaluation metrics and protocols to enhance the transparency and reproducibility of performance evaluations.

Open database: DFLIP-3K is an open database that fosters transparency and encourages collaborative efforts to further enhance its growth.

📋 Table of Contents


📚 Features

[Back to top]

DFLIP-3K has the following features:

⭐️ DFLIP-3K database encompasses approximately 300K deepfake samples produced from about 3K generative models.
⭐️ 190K textual prompts that are used to create images.
⭐️ Linguistic profiling in simultaneous deepfake detection, identification, and prompt prediction.

DFLIP-3K will be continuously updated to track the latest advances in deepfake.

The collection of DFLIP-3K and implementations of detection methods is an ongoing project.

You are welcome to contribute your methods and data to DFLIP-3K.

Visualization

The project page displays a limited selection of DFLIP-3K samples, comprising images and prompts. https://dflip3k.github.io/DFLIP-3K/

⏳ Quick Start

1. Download Data

[Back to top]

Please download metadata we proveded from this URL. Metadata is stored in this repository in JSON format. Upon downloading metadata, please ensure to store them in the ./datasets folder.

Once you have downloaded metadata, you can proceed with running the following line to download image:

Note that it may fail several times due to unstable network connections, but the script can be restarted and downloaded files will not be re-downloaded.

cd utils

python downloader.py --meta_file [Path to JSON file].json --save_dir [where to save iamges]
datasets
├── downloaded
│   ├── mj
│   │  ├──*.jpg
│   │  └──*.png
│   ├── sd
│   │  ├──*.jpg
│   │  └──*.png
│   ├── pd
│   ├── dalle
│   └── ...
├── pd.json
└── ...

2. Preprocessing

[Back to top]

After downloading all data, we strongly recommend that you convert all images to the same format (such as PNG in our dataset). This will facilitate reducing errors caused by different image formats during the loading process. However, this is optional.

3. Pretrained Weights

[Back to top]

Please wait a moment.

4. Training

[Back to top]

We give Otter based implementation for deepfake detection, identification and prompt prediction tasks.

We use openflamingo-9b for training.

git clone https://github.com/dflip3k/Otter

cd Otter 

accelerate launch caption_ds.py \
  --pretrained_model_name_or_path=luodian/openflamingo-9b-hf \
  --dataset_resampled --multi_instruct_path=[Path to benchmark split JSON file] \
  --run_name=aiart --batch_size=1 --num_epochs=6 \
  --cross_attn_every_n_layers=4 --lr_scheduler=cosine --learning_rate=1e-5 \
  --data_root=[Path to dataset]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published