AntiFraud

A Financial Fraud Detection Framework.

Source codes implementation of papers:

MCNN: Credit card fraud detection using convolutional neural networks, in ICONIP 2016.
STAN: Spatio-temporal attention-based neural network for credit card fraud detection, in AAAI2020
STAGN: Graph Neural Network for Fraud Detection via Spatial-temporal Attention, in TKDE2020
GTAN: Semi-supervised Credit Card Fraud Detection via Attribute-driven Graph Representation, in AAAI2023
RGTAN: Enhancing Attribute-driven Fraud Detection with Risk-aware Graph Representation
GGTAN: GGTAN: A Novel GAT-Enhanced Gated Temporal Attention Network for Advanced Fraud Detection in Financial Transactions
- Project Link

Usage

Data processing

Run unzip /data/Amazon.zip, unzip /data/YelpChi.zip, unzip /data/S-FFSD.zip and unzip /data/IBM.zipto unzip the datasets;
Run python feature_engineering/data_process.py to pre-process all datasets needed in this repo.
- If you just want to use FFSD and IBM dataset with GTAN and GGTAN method, then run python feature_engineering/data_process_ggtan.py

Training & Evalutaion

To test implementations of MCNN, STAN and STAGN, run

python main.py --method mcnn
python main.py --method stan
python main.py --method stagn

Configuration files can be found in config/mcnn_cfg.yaml, config/stan_cfg.yaml and config/stagn_cfg.yaml, respectively.

Models in GTAN, RGTAN and GGTAN can be run via:

python main.py --method gtan
python main.py --method rgtan
python main.py --method ggtan

For specification of hyperparameters, please refer to config/gtan_cfg.yaml, config/rgtan_cfg.yaml, and congif/ggtan_cfg.yaml.

Data Description

This repository utilizes four datasets for model experiments: YelpChi, Amazon, S-FFSD, and IBM.

YelpChi and Amazon Datasets

These datasets are sourced from CARE-GNN and the original data can be found in this repository.

S-FFSD Dataset

S-FFSD is a simulated, smaller version of a financial fraud semi-supervised dataset. Its description is as follows:

Name	Type	Range	Note
Time	np.int32	0 to N	N: Number of transactions.
Source	string	S_0 to S_ns	ns: Number of transaction senders.
Target	string	T_0 to T_nt	nt: Number of transaction receivers.
Amount	np.float32	0.00 to np.inf	Transaction amount.
Location	string	L_0 to L_nl	nl: Number of transaction locations.
Type	string	TP_0 to TP_np	np: Number of different transaction types.
Labels	np.int32	0 to 2	2 denotes 'unlabeled'.

IBM Credit Card Transaction Dataset

This is a publicly available synthetic dataset for fraud detection research, containing simulated transaction data provided by IBM.

Dataset Highlights

Total Transactions: 24 million unique transactions.
Unique Merchants: 6,000.
Unique Cards: 100,000.
Fraudulent Transactions: 30,000 samples (0.1% of total transactions).

Key Characteristics

Class Imbalance: More non-fraudulent transactions, reflecting real-world scenarios.
Fraud Labels: Indicates whether a transaction is fraudulent.
Data Nature: Synthetic, not linked to real customers or financial institutions.

Data Accessibility

Local Download: IBM Dataset Link
Kaggle: Kaggle Dataset Link

Usage

The dataset is used in our experiments on a sample of approximately 100,000 transactions.

Seeking more public datasets for interesting studies! Suggestions are welcome.

Model Performance Summary on S-FFSD and IBM Datasets

Model	S-FFSD			IBM
	AUC	F1	AP	AUC	F1	AP
XGB	0.7931	0.6512	0.4830	0.9272	0.8941	0.8111
MCNN	0.7129	0.6861	0.3309	0.8771±0.001	0.7814±0.004	0.4084±0.007
GAT	0.7302±0.005	0.6147±0.006	-	0.9256±0.001	0.8325±0.025	-
GTAN	0.8286	0.7336	0.6585	0.9140±0.010	0.6959±0.059	0.5424±0.040
GGTAN	0.8951±0.003	0.7853±0.006	0.7530±0.006	0.9952±0.000	0.9496±0.002	0.9646±0.002

Repo Structure

The repository is organized as follows:

models/: the pre-trained models for each method. The readers could either train the models by themselves or directly use our pre-trained models;
data/: dataset files;
config/: configuration files for different models;
feature_engineering/: data processing;
methods/: implementations of models;
main.py: organize all models;
requirements.txt: package dependencies;

Requirements

python           3.7
scikit-learn     1.0.2
pandas           1.3.5
numpy            1.21.6
networkx         2.6.3
scipy            1.7.3
torch            1.12.1+cu113
dgl-cu113        0.8.1
torch_geometric  2.4.0

Citing

If you find Antifraud is useful for your research, please consider citing the following papers:

@inproceedings{Xiang2023SemiSupervisedCC,
    title={Semi-supervised Credit Card Fraud Detection via Attribute-driven Graph Representation},
    author={Sheng Xiang and Mingzhi Zhu and Dawei Cheng and Enxia Li and Ruihui Zhao and Yi Ouyang and Ling Chen and Yefeng Zheng},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    year={2023}
}
@article{cheng2020graph,
    title={Graph Neural Network for Fraud Detection via Spatial-temporal Attention},
    author={Cheng, Dawei and Wang, Xiaoyang and Zhang, Ying and Zhang, Liqing},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    year={2020},
    publisher={IEEE}
}
@inproceedings{cheng2020spatio,
    title={Spatio-temporal attention-based neural network for credit card fraud detection},
    author={Cheng, Dawei and Xiang, Sheng and Shang, Chencheng and Zhang, Yiyi and Yang, Fangzhou and Zhang, Liqing},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={34},
    number={01},
    pages={362--369},
    year={2020}
}
@inproceedings{fu2016credit,
    title={Credit card fraud detection using convolutional neural networks},
    author={Fu, Kang and Cheng, Dawei and Tu, Yi and Zhang, Liqing},
    booktitle={International Conference on Neural Information Processing},
    pages={483--490},
    year={2016},
    organization={Springer}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AntiFraud

Usage

Data processing

Training & Evalutaion

Data Description

YelpChi and Amazon Datasets

S-FFSD Dataset

IBM Credit Card Transaction Dataset

Dataset Highlights

Key Characteristics

Data Accessibility

Usage

Model Performance Summary on S-FFSD and IBM Datasets

Repo Structure

Requirements

Citing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
config		config
data		data
feature_engineering		feature_engineering
methods		methods
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

bertmclee/antifraud

Folders and files

Latest commit

History

Repository files navigation

AntiFraud

Usage

Data processing

Training & Evalutaion

Data Description

YelpChi and Amazon Datasets

S-FFSD Dataset

IBM Credit Card Transaction Dataset

Dataset Highlights

Key Characteristics

Data Accessibility

Usage

Model Performance Summary on S-FFSD and IBM Datasets

Repo Structure

Requirements

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages