Enhancing the domain generalization capability of Face Anti-Spoofing (FAS) remains a challenge. Existing methods aim to extract domain-invariant features from various training domains. Despite the promising performance, the extracted features inevitably contain residual style feature bias (e.g., illumination, capture device), resulting in inferior generalization performance.
In this paper, we propose the Text Guided Domain Generalization (TeG-DG) framework, which can effectively leverage text information for cross-domain alignment. As an abstract and universal representation, texts can capture the commonalities and essential characteristics across various attacks, bridging the gap between different image domains. Contrary to existing vision-language models, the proposed framework is elaborately designed to enhance the domain generalization ability of the FAS task. Concretely, we designed a Text Prompter (TP) that dynamically generates text prompts and a Hierarchical Attention Fusion (HAF) module to integrate multiple levels of visual features. Furthermore, we propose a Textually Enhanced Visual Discriminator (TEVD) that not only improves vision-language alignment but also regularizes the classifier using textual features. Extensive experiments demonstrate that TeG-DG significantly outperforms prior approaches, particularly in scenarios with limited source domain data.
Overview of the proposed Text Guided Domain Generalization (TeG-DG) framework.
Prerequisites
- Linux
- CPU or NVIDIA GPU + CUDA CuDNN
- Python 3.x
We recommend you to use Anaconda to create a conda environment:
conda create -n TEGDG python=3.8
conda activate TEGDGInstall PyTorch (2.0 or later) and torchvision.
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 -c pytorchInstall some additional dependencies:
pip install -r requirements.txtImplement
sh train_MCIO.shWe evaluate the proposed method across four publicly available datasets: MSU-MFSD (denoted as M), Replay-Attack (denoted as I), CASIA-MFSD (denoted as C), OULU-NPU (denoted as O) .
We follow previous DG-FAS works, adopt a leave-one-out (LOO) strategy: three datasets are randomly selected for training, and the rest one for testing.
└── YOUR_Data_Dir
├── OULU_NPU
| ├──trainset
| ├──testset
| ├──trainset.csv
| └──testset.csv
...
└── REPLAY_ATTACK
├──trainset
...
└──testset.csv
The trainset.csv and testset.csv file follow the format below:
| Index | Dataset | Path | Type | Attack | Label |
|---|---|---|---|---|---|
| 0 | MSU-MFSD | path/to/data1 | train | replay | 0 |
| 1 | MSU-MFSD | path/to/data2 | train | real | 1 |
The code is built on CLIP, BLIP and SSAN. Thanks for their works!
