NNG-Mix

This repository contains the implementation of the paper:

NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation
Hao Dong, Gaëtan Frusque, Yue Zhao, Eleni Chatzi and Olga Fink
Link to the arXiv version of the paper is available.

We investigate improving semi-supervised anomaly detection performance from a novel viewpoint, by generating additional pseudo-anomalies based on the limited labeled anomalies and a large amount of unlabeled data. We introduce NNG-Mix, a simple and effective pseudo-anomaly generation algorithm, that optimally utilizes information from both labeled anomalies and unlabeled data.

Nearest Neighbor Gaussian Mixup (NNG-Mix) makes good use of information from both labeled anomalies and unlabeled data to generate pseudo-anomalies effectively.

Dataset

Download Classical, CV_by_ResNet18, and NLP_by_BERT from ADBench and put under datasets/ folder.

Code

Change --ratio 1.0 to --ratio 0.5 or --ratio 0.1 for training with 5% or 1% available labeled anomalies.

Classical Dataset

Train on Classical datasets with 10% available labeled anomalies using DeepSAD

python NNG_Mix.py --ratio 1.0 --method nng_mix --seed 0 --alg DeepSAD --dataset Classical --nn_k 10 --nn_k_anomaly 10 --nn_mix_gaussian --nn_mix_gaussian_std 0.01 --mixup_alpha 0.2 --mixup_beta 0.2

Train on Classical datasets with 10% available labeled anomalies using MLP

python NNG_Mix.py --ratio 1.0 --method nng_mix --seed 0 --alg MLP --dataset Classical --nn_k 10 --nn_k_anomaly 10 --nn_mix_gaussian --nn_mix_gaussian_std 0.01 --mixup_alpha 0.2 --mixup_beta 0.2

CV Dataset

Train on CV with 10% available labeled anomalies using DeepSAD

python NNG_Mix.py --ratio 1.0 --method nng_mix --seed 0 --alg DeepSAD --dataset CV --nn_k 10 --nn_k_anomaly 10 --nn_mix_gaussian --nn_mix_gaussian_std 0.01 --mixup_alpha 0.2 --mixup_beta 0.2

Train on CV with 10% available labeled anomalies using MLP

python NNG_Mix.py --ratio 1.0 --method nng_mix --seed 0 --alg MLP --dataset CV --nn_k 10 --nn_k_anomaly 10 --nn_mix_gaussian --nn_mix_gaussian_std 0.3 --mixup_alpha 0.2 --mixup_beta 0.2

NLP Dataset

Train on NLP with 10% available labeled anomalies using DeepSAD

python NNG_Mix.py --ratio 1.0 --method nng_mix --seed 0 --alg DeepSAD --dataset NLP --nn_k 10 --nn_k_anomaly 10 --nn_mix_gaussian --nn_mix_gaussian_std 0.01 --mixup_alpha 0.2 --mixup_beta 0.2

Train on NLP with 10% available labeled anomalies using MLP

python NNG_Mix.py --ratio 1.0 --method nng_mix --seed 0 --alg MLP --dataset NLP --nn_k 10 --nn_k_anomaly 10 --nn_mix_gaussian --nn_mix_gaussian_std 0.3 --mixup_alpha 0.2 --mixup_beta 0.2

Contact

If you have any questions, please send an email to donghaospurs@gmail.com

Citation

If you find our work useful in your research please consider citing our paper:

@article{dong2023nngmix,
	author   = {Hao Dong and Ga{\"e}tan Frusque and Yue Zhao and Eleni Chatzi and Olga Fink},
	title    = {{NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation}},
	journal  = {arXiv preprint arXiv:2311.11961},
	year     = {2023},
}

Acknowledgement

Many thanks to the excellent open-source projects ADBench.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
baseline		baseline
datasets		datasets
logs		logs
pics		pics
NNG_Mix.py		NNG_Mix.py
README.md		README.md
data_generator.py		data_generator.py
myutils.py		myutils.py
requirement.txt		requirement.txt

donghao51/NNG-Mix

Folders and files

Latest commit

History

Repository files navigation