This repository contains the code used for the paper
Chenghao Li and Yuanyuan Lin. A data-augmented contrastive learning approach to nonparametric density estimation. Journal of Machine Learning Research. 2026, accepted.
The main requirements:
- Python>=3.11.3
- Pytorch>=2.0.1
The details can be found in requirements.txt.
You can create an virtual environment using:
$ conda create --name <env> --file requirements.txtCode has been tested on CUHK High-Performance Computing Cluster with NVIDIA Tesla V100(32GB)/Tesla P100(16GB) GPUs.
The main python script main_est.py is used for implementing data augmented nonparamteric noise contrastive estimation in 2-D cases.
The neural network architeture, distribution models, and basic functions are defined in setup.py.
The following three models of data ditributions has been implemented in the code:
- Indepedent Gaussian mixture mdoel;
- Eight-octagon Gaussian mixture model;
- Involute.
Before running main_est.py, you may specfiy
-
[rho]: The parameter of data augmentation,$\rho$ ; -
[model_type]: the distribution model type, one of 'indep_GMM', 'octagon_GMM', or 'involute'. The configuration files of all parameters related to network training are in theconfigsfolder.
The script main_est_multi_dim.py is used for implementing data augmented nonparamteric noise contrastive estimation in higher dimensional cases.
Before running main_est_multi_dim.py, you may specfiy
-
[d]: the dimension; -
[no_augmentation]: whether to use data augmentaion, that is, whether to set the value of$\rho$ to be 0.
The scripts main_est_varrho.py and main_est_varrho_md.py are used for comparing the performance of data augmented nonparamteric noise contrastive estimation under different
[d]: the dimension, exclusive formain_est_varrho_md.py;[rho_list]: the sequence of rho used for comparison.
The scripts main_est_anomaly_detection.py is used for applying data augmented nonparamteric noise contrastive estimation to two anomaly/outlier detection datasets: Shuttle and Mammography.
Both datasets are availiable from the ODDS Database.
Before running the scripts, you may specfiy
[dataset_name]: the name of datset, one of 'Shuttle' and 'Mammography'.
The scripts visualization_2d.py and visualization_line_chart are used for visualizing true and estimated densities in 2-D cases, and generating the line chart for comparing the performance under different [data_folder].